Kubernetes Horizontal Pod Autoscaling: Handling Traffic Spikes
Black Friday traffic spike crashed our services. We had 3 pods running, traffic increased 10x, and everything fell over.
I implemented Horizontal Pod Autoscaling (HPA). Now our services automatically scale from 3 to 30 pods during traffic spikes. No more crashes.
Table of Contents
The Black Friday Disaster
Traffic pattern:
- Normal: 500 req/s (3 pods)
- Black Friday: 5000 req/s (still 3 pods)
- Result: 503 errors, angry customers
We manually scaled to 20 pods, but it took 15 minutes. By then, we’d lost sales.
Horizontal Pod Autoscaler
HPA automatically scales pods based on metrics:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: user-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: user-service
minReplicas: 3
maxReplicas: 30
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
When CPU > 70%, HPA adds pods. When CPU < 70%, HPA removes pods.
CPU-Based Autoscaling
Simple and effective:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: api-gateway-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-gateway
minReplicas: 5
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
Target 60% CPU utilization. Leaves headroom for spikes.
Memory-Based Autoscaling
For memory-intensive services:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: data-processor-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: data-processor
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Multiple Metrics
Scale on CPU OR memory:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: user-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: user-service
minReplicas: 3
maxReplicas: 30
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
HPA scales if EITHER metric exceeds target.
Custom Metrics
Scale on requests per second:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: api-gateway-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-gateway
minReplicas: 5
maxReplicas: 50
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
Requires metrics server and custom metrics API.
Setting Up Metrics Server
Install metrics-server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Verify:
kubectl top nodes
kubectl top pods
Resource Requests Required
HPA needs resource requests defined:
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
spec:
replicas: 3
template:
spec:
containers:
- name: user-service
image: user-service:latest
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
Without requests, HPA can’t calculate utilization.
Scaling Behavior
Control scale-up/down speed:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: user-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: user-service
minReplicas: 3
maxReplicas: 30
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
selectPolicy: Min
- Scale up: Aggressive (double pods every 15s)
- Scale down: Conservative (wait 5 minutes, then reduce by 50%)
Monitoring HPA
Check HPA status:
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
user-service-hpa Deployment/user-service 45%/70% 3 30 5
Describe for details:
kubectl describe hpa user-service-hpa
Output:
Metrics:
resource cpu on pods (as a percentage of request): 45% (45m) / 70%
Min replicas: 3
Max replicas: 30
Deployment pods: 5 current / 5 desired
Events:
Normal SuccessfulRescale 2m horizontal-pod-autoscaler New size: 5; reason: cpu resource utilization (percentage of request) above target
Testing Autoscaling
Generate load:
# Install hey
go get -u github.com/rakyll/hey
# Generate load
hey -z 5m -c 100 http://api-gateway/users
Watch pods scale:
watch kubectl get pods
Real-World Example
Our API gateway during traffic spike:
Time | Traffic | CPU | Pods | Status
--------|---------|------|------|--------
10:00 | 500/s | 40% | 3 | Normal
10:15 | 2000/s | 85% | 6 | Scaling up
10:20 | 4000/s | 75% | 12 | Scaling up
10:25 | 5000/s | 70% | 15 | Stable
10:45 | 3000/s | 60% | 15 | Stable (cooldown)
11:00 | 1000/s | 45% | 10 | Scaling down
11:15 | 500/s | 35% | 5 | Scaling down
11:30 | 500/s | 40% | 3 | Back to normal
HPA handled the spike automatically!
Cost Optimization
Set appropriate min/max:
# Development
minReplicas: 1
maxReplicas: 5
# Staging
minReplicas: 2
maxReplicas: 10
# Production
minReplicas: 5
maxReplicas: 50
Don’t over-provision. Let HPA scale as needed.
Combining with Cluster Autoscaler
HPA scales pods. Cluster Autoscaler scales nodes.
# Cluster Autoscaler config
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler
namespace: kube-system
data:
min-nodes: "3"
max-nodes: "20"
When HPA adds pods and nodes are full, Cluster Autoscaler adds nodes.
Prometheus Custom Metrics
Scale on custom metrics from Prometheus:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: api-gateway-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-gateway
minReplicas: 5
maxReplicas: 50
metrics:
- type: External
external:
metric:
name: http_requests_per_second
selector:
matchLabels:
service: api-gateway
target:
type: AverageValue
averageValue: "1000"
Requires Prometheus Adapter.
Best Practices
- Set resource requests - Required for HPA
- Conservative scale-down - Avoid flapping
- Aggressive scale-up - Handle spikes quickly
- Monitor HPA events - Understand scaling behavior
- Test under load - Verify HPA works
Common Issues
1. HPA not scaling:
# Check metrics
kubectl top pods
# Check HPA
kubectl describe hpa user-service-hpa
Usually missing resource requests or metrics-server not running.
2. Flapping (constant scale up/down):
Increase stabilization window:
behavior:
scaleDown:
stabilizationWindowSeconds: 600 # 10 minutes
3. Slow scale-up:
Reduce stabilization window and increase scale-up rate:
behavior:
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 200 # Triple pods
periodSeconds: 15
Vertical Pod Autoscaler
For right-sizing resources (not covered here):
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: user-service-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: user-service
updatePolicy:
updateMode: "Auto"
VPA adjusts CPU/memory requests. Use with HPA carefully.
Results
After implementing HPA:
Before:
- Manual scaling during traffic spikes
- 15-minute response time
- Frequent outages during high traffic
- Over-provisioned during low traffic
After:
- Automatic scaling
- 2-minute response time
- No outages
- Cost reduced by 40% (fewer idle pods)
Lessons Learned
- Start conservative - Don’t scale too aggressively
- Monitor closely - Watch HPA behavior
- Test thoroughly - Load test before production
- Set appropriate limits - Don’t let HPA scale infinitely
- Combine with alerts - Know when HPA is scaling
Conclusion
HPA is essential for production Kubernetes. It handles traffic spikes automatically and reduces costs during low traffic.
Key takeaways:
- Use HPA for all production services
- Set resource requests/limits
- Start with CPU-based scaling
- Add custom metrics as needed
- Test under load
Our services now handle 10x traffic spikes without manual intervention. HPA saved us during Black Friday.
If you’re running Kubernetes in production, implement HPA. Your on-call team will thank you.