Moving to Amazon EKS: Lessons from Production
We migrated from kops to Amazon EKS. Here’s what we learned.
Why EKS?
We were running Kubernetes with kops on AWS. It worked, but:
- Manual control plane upgrades
- Managing etcd backups
- Monitoring control plane health
- Patching master nodes
EKS promises:
- Managed control plane
- Automatic upgrades
- Built-in HA
- AWS integration
The Migration
Step 1: Create EKS Cluster
eksctl create cluster \
--name production \
--version 1.13 \
--region us-east-1 \
--nodegroup-name standard-workers \
--node-type m5.large \
--nodes 3 \
--nodes-min 3 \
--nodes-max 10 \
--managed
Step 2: Install Add-ons
# AWS Load Balancer Controller
kubectl apply -k "github.com/aws/eks-charts/stable/aws-load-balancer-controller//crds?ref=master"
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
-n kube-system \
--set clusterName=production
# Cluster Autoscaler
kubectl apply -f cluster-autoscaler.yaml
# Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Step 3: Migrate Workloads
We used Velero for migration:
# Install Velero on old cluster
velero install \
--provider aws \
--bucket velero-backups \
--backup-location-config region=us-east-1
# Backup everything
velero backup create migration-backup
# Install Velero on EKS
velero install \
--provider aws \
--bucket velero-backups \
--backup-location-config region=us-east-1
# Restore
velero restore create --from-backup migration-backup
Step 4: Switch Traffic
Used weighted DNS to gradually shift traffic:
- Week 1: 10% to EKS
- Week 2: 50% to EKS
- Week 3: 100% to EKS
What We Like
1. Managed Control Plane
No more managing master nodes. AWS handles:
- etcd backups
- Control plane upgrades
- High availability
- Monitoring
2. AWS Integration
IAM Roles for Service Accounts (IRSA)
apiVersion: v1
kind: ServiceAccount
metadata:
name: s3-reader
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/s3-reader
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
spec:
template:
spec:
serviceAccountName: s3-reader
containers:
- name: app
image: myapp
Pods can access AWS services without access keys!
VPC CNI
Native VPC networking. Pods get real VPC IPs.
ELB Integration
apiVersion: v1
kind: Service
metadata:
name: myapp
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
spec:
type: LoadBalancer
ports:
- port: 80
Creates Network Load Balancer automatically.
3. Managed Node Groups
eksctl create nodegroup \
--cluster production \
--name high-memory \
--node-type r5.xlarge \
--nodes 2 \
--managed
AWS manages:
- Node updates
- AMI upgrades
- Scaling
4. Fargate Support
Run pods without managing nodes:
apiVersion: v1
kind: Namespace
metadata:
name: fargate-ns
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: production
fargateProfiles:
- name: fargate-profile
selectors:
- namespace: fargate-ns
Pods in fargate-ns run on Fargate. No nodes needed.
What We Don’t Like
1. Cost
kops:
- 3 m5.large masters: $130/month
- 10 m5.large workers: $430/month
- Total: $560/month
EKS:
- Control plane: $73/month
- 10 m5.large workers: $430/month
- Total: $503/month
EKS is cheaper, but not by much.
2. Slower Kubernetes Updates
kops: Upgrade to new Kubernetes version immediately
EKS: Wait for AWS to support it (usually 2-3 months delay)
3. Less Control
Can’t customize control plane. Stuck with AWS defaults.
4. VPC CNI Limitations
Each node has a limit on pods based on ENIs:
- m5.large: 29 pods max
- m5.xlarge: 58 pods max
Can be limiting for small pods.
Issues We Hit
1. IAM Permissions
IRSA requires specific IAM setup. Took time to get right.
2. VPC IP Exhaustion
Pods use VPC IPs. We ran out of IPs in our subnet.
Solution: Use larger subnets or secondary CIDR blocks.
3. Cluster Autoscaler
Had to configure it specifically for EKS:
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
spec:
template:
spec:
containers:
- name: cluster-autoscaler
image: k8s.gcr.io/cluster-autoscaler:v1.13.0
command:
- ./cluster-autoscaler
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/production
4. Logging
Had to set up Fluent Bit for logging:
kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluent-bit-quickstart.yaml
Cost Optimization
1. Spot Instances
eksctl create nodegroup \
--cluster production \
--name spot-workers \
--node-type m5.large \
--nodes 5 \
--spot
70% cheaper than on-demand.
2. Fargate for Batch Jobs
Run batch jobs on Fargate. Pay only when running.
3. Right-Sizing
Used VPA (Vertical Pod Autoscaler) to right-size pods.
Reduced resource requests by 30%.
The Results
Before (kops):
- Manual control plane management
- Manual upgrades
- Manual etcd backups
- Cost: $560/month
After (EKS):
- Managed control plane
- Automatic upgrades
- Automatic backups
- Cost: $503/month (with spot instances: $350/month)
Would We Do It Again?
Yes. EKS is worth it for:
- Reduced operational burden
- Better AWS integration
- Managed control plane
But it’s not perfect:
- Slower Kubernetes updates
- Less control
- VPC CNI limitations
Alternatives
GKE (Google Kubernetes Engine)
More mature than EKS. Better Kubernetes integration.
But we’re on AWS, so EKS makes sense.
AKS (Azure Kubernetes Service)
If you’re on Azure.
Self-Managed
If you need full control and have the expertise.
The Verdict
EKS is a solid managed Kubernetes offering. It reduces operational burden and integrates well with AWS.
If you’re on AWS and running Kubernetes, EKS is worth considering.
Questions? Ask away!