How We Cut Our EC2 Costs by 60% Without Sacrificing Performance

Last month our AWS bill hit $12,000 and management wasn’t happy. After two weeks of optimization work, we got it down to $4,800. Here’s how we did it.

The Problem

We were running everything on on-demand instances because “we might need to scale quickly.” Spoiler: we never scaled quickly. We just paid 3x more than we needed to.

Strategy 1: Reserved Instances

This was the low-hanging fruit. We analyzed our usage over 3 months and found that we had a baseline of 15 m4.large instances running 24/7. These were perfect candidates for Reserved Instances.

Before: 15 × $0.10/hour × 730 hours = $1,095/month
After: 15 × $0.065/hour × 730 hours = $712/month (1-year RI, partial upfront)

That’s $383/month saved just by committing to instances we were already running. The ROI was immediate.

The Gotcha

Reserved Instances are per-region and per-instance-type. We bought RIs for us-east-1 m4.large, then realized half our instances were in us-west-2. Oops. Make sure you know where your instances are before buying.

Strategy 2: Right-Sizing

We had a bunch of m4.xlarge instances (4 vCPUs, 16GB RAM) running applications that barely used 1 vCPU and 4GB RAM. Classic over-provisioning.

I wrote a quick script to pull CloudWatch metrics:

#!/bin/bash
# Check CPU utilization for all instances

aws ec2 describe-instances --query 'Reservations[*].Instances[*].[InstanceId]' --output text | while read instance; do
    echo "Instance: $instance"
    aws cloudwatch get-metric-statistics \
        --namespace AWS/EC2 \
        --metric-name CPUUtilization \
        --dimensions Name=InstanceId,Value=$instance \
        --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
        --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
        --period 3600 \
        --statistics Average \
        --query 'Datapoints[*].Average' \
        --output text
done

Turns out, 60% of our instances were using less than 20% CPU on average. We downsized:

10 m4.xlarge → m4.large (saved $500/month)
5 m4.large → m4.medium (saved $180/month)

Strategy 3: Auto-Scaling

We had instances running at night when traffic was basically zero. Set up auto-scaling to scale down to 50% capacity during off-hours (10 PM - 6 AM).

{
  "ScheduledActions": [
    {
      "ScheduledActionName": "scale-down-night",
      "Recurrence": "0 22 * * *",
      "MinSize": 5,
      "MaxSize": 10,
      "DesiredCapacity": 5
    },
    {
      "ScheduledActionName": "scale-up-morning",
      "Recurrence": "0 6 * * *",
      "MinSize": 10,
      "MaxSize": 20,
      "DesiredCapacity": 10
    }
  ]
}

This saved another $400/month.

Strategy 4: Spot Instances for Non-Critical Workloads

We run batch jobs for data processing. These don’t need to complete immediately, so they’re perfect for Spot Instances.

Moved our batch processing from on-demand m4.large ($0.10/hour) to spot instances (average $0.03/hour). That’s 70% savings on compute for batch jobs.

The catch: Spot instances can be terminated with 2 minutes notice. Make sure your jobs can handle interruption gracefully.

Strategy 5: Terminate Zombie Instances

Found 8 instances that nobody knew about. They were created for testing 6 months ago and forgotten. Terminated them immediately.

Savings: $600/month

Pro tip: Tag everything. We now have a policy that any instance without proper tags gets terminated after 7 days.

The Results

Before: $12,000/month
After: $4,800/month
Savings: 60%

Breakdown:

Reserved Instances: $2,300/month saved
Right-sizing: $680/month saved
Auto-scaling: $400/month saved
Spot instances: $220/month saved
Terminated zombies: $600/month saved
Other optimizations: $1,800/month saved

Lessons Learned

Monitor Everything: If you’re not tracking it, you can’t optimize it. Set up CloudWatch dashboards.
Start with the Obvious: Reserved Instances and right-sizing are easy wins. Do these first.
Automate: Manual scaling doesn’t work. People forget. Automation doesn’t.
Review Regularly: Set a calendar reminder to review costs monthly. It’s easy for waste to creep back in.
Tag Everything: Seriously. Tags make it possible to track costs by project, team, environment, etc.

Tools We Use

AWS Cost Explorer: For analyzing spending patterns
CloudWatch: For monitoring resource utilization
Custom scripts: For automated right-sizing recommendations
Slack bot: Sends daily cost reports to our DevOps channel

What’s Next

We’re looking into:

Containerization with ECS to improve resource utilization
Moving some workloads to Lambda
Using S3 lifecycle policies to reduce storage costs

If you’re running on AWS and haven’t optimized costs, you’re probably overpaying. Start with Reserved Instances and right-sizing. You’ll see results immediately.

Anyone else have cost optimization wins to share? I’d love to hear them.