Moving to Amazon EKS: Lessons from Production

We migrated from kops to Amazon EKS. Here’s what we learned.

Why EKS?

We were running Kubernetes with kops on AWS. It worked, but:

Manual control plane upgrades
Managing etcd backups
Monitoring control plane health
Patching master nodes

EKS promises:

Managed control plane
Automatic upgrades
Built-in HA
AWS integration

The Migration

Step 1: Create EKS Cluster

eksctl create cluster \
  --name production \
  --version 1.13 \
  --region us-east-1 \
  --nodegroup-name standard-workers \
  --node-type m5.large \
  --nodes 3 \
  --nodes-min 3 \
  --nodes-max 10 \
  --managed

Step 2: Install Add-ons

# AWS Load Balancer Controller
kubectl apply -k "github.com/aws/eks-charts/stable/aws-load-balancer-controller//crds?ref=master"
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
  -n kube-system \
  --set clusterName=production

# Cluster Autoscaler
kubectl apply -f cluster-autoscaler.yaml

# Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Step 3: Migrate Workloads

We used Velero for migration:

# Install Velero on old cluster
velero install \
  --provider aws \
  --bucket velero-backups \
  --backup-location-config region=us-east-1

# Backup everything
velero backup create migration-backup

# Install Velero on EKS
velero install \
  --provider aws \
  --bucket velero-backups \
  --backup-location-config region=us-east-1

# Restore
velero restore create --from-backup migration-backup

Step 4: Switch Traffic

Used weighted DNS to gradually shift traffic:

Week 1: 10% to EKS
Week 2: 50% to EKS
Week 3: 100% to EKS

What We Like

1. Managed Control Plane

No more managing master nodes. AWS handles:

etcd backups
Control plane upgrades
High availability
Monitoring

2. AWS Integration

IAM Roles for Service Accounts (IRSA)

apiVersion: v1
kind: ServiceAccount
metadata:
  name: s3-reader
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/s3-reader
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app
spec:
  template:
    spec:
      serviceAccountName: s3-reader
      containers:
      - name: app
        image: myapp

Pods can access AWS services without access keys!

VPC CNI

Native VPC networking. Pods get real VPC IPs.

ELB Integration

apiVersion: v1
kind: Service
metadata:
  name: myapp
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
spec:
  type: LoadBalancer
  ports:
  - port: 80

Creates Network Load Balancer automatically.

3. Managed Node Groups

eksctl create nodegroup \
  --cluster production \
  --name high-memory \
  --node-type r5.xlarge \
  --nodes 2 \
  --managed

AWS manages:

Node updates
AMI upgrades
Scaling

4. Fargate Support

Run pods without managing nodes:

apiVersion: v1
kind: Namespace
metadata:
  name: fargate-ns
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: production
fargateProfiles:
  - name: fargate-profile
    selectors:
      - namespace: fargate-ns

Pods in fargate-ns run on Fargate. No nodes needed.

What We Don’t Like

1. Cost

kops:

3 m5.large masters: $130/month
10 m5.large workers: $430/month
Total: $560/month

EKS:

Control plane: $73/month
10 m5.large workers: $430/month
Total: $503/month

EKS is cheaper, but not by much.

2. Slower Kubernetes Updates

kops: Upgrade to new Kubernetes version immediately
EKS: Wait for AWS to support it (usually 2-3 months delay)

3. Less Control

Can’t customize control plane. Stuck with AWS defaults.

4. VPC CNI Limitations

Each node has a limit on pods based on ENIs:

m5.large: 29 pods max
m5.xlarge: 58 pods max

Can be limiting for small pods.

Issues We Hit

1. IAM Permissions

IRSA requires specific IAM setup. Took time to get right.

2. VPC IP Exhaustion

Pods use VPC IPs. We ran out of IPs in our subnet.

Solution: Use larger subnets or secondary CIDR blocks.

3. Cluster Autoscaler

Had to configure it specifically for EKS:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
spec:
  template:
    spec:
      containers:
      - name: cluster-autoscaler
        image: k8s.gcr.io/cluster-autoscaler:v1.13.0
        command:
          - ./cluster-autoscaler
          - --cloud-provider=aws
          - --skip-nodes-with-local-storage=false
          - --expander=least-waste
          - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/production

4. Logging

Had to set up Fluent Bit for logging:

kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluent-bit-quickstart.yaml

Cost Optimization

1. Spot Instances

eksctl create nodegroup \
  --cluster production \
  --name spot-workers \
  --node-type m5.large \
  --nodes 5 \
  --spot

70% cheaper than on-demand.

2. Fargate for Batch Jobs

Run batch jobs on Fargate. Pay only when running.

3. Right-Sizing

Used VPA (Vertical Pod Autoscaler) to right-size pods.

Reduced resource requests by 30%.

The Results

Before (kops):

Manual control plane management
Manual upgrades
Manual etcd backups
Cost: $560/month

After (EKS):

Managed control plane
Automatic upgrades
Automatic backups
Cost: $503/month (with spot instances: $350/month)

Would We Do It Again?

Yes. EKS is worth it for:

Reduced operational burden
Better AWS integration
Managed control plane

But it’s not perfect:

Slower Kubernetes updates
Less control
VPC CNI limitations

Alternatives

GKE (Google Kubernetes Engine)

More mature than EKS. Better Kubernetes integration.

But we’re on AWS, so EKS makes sense.

AKS (Azure Kubernetes Service)

If you’re on Azure.

Self-Managed

If you need full control and have the expertise.

The Verdict

EKS is a solid managed Kubernetes offering. It reduces operational burden and integrates well with AWS.

If you’re on AWS and running Kubernetes, EKS is worth considering.

Questions? Ask away!