Running Stateful Applications on Kubernetes: StatefulSets Deep Dive

For the past year, we’ve been running all our stateless services on Kubernetes. But we kept our databases on traditional VMs because “you shouldn’t run databases in containers.”

Last quarter, I decided to challenge that assumption. We migrated our PostgreSQL and Redis instances to Kubernetes using StatefulSets. Here’s what I learned.

Why StatefulSets?

Regular Kubernetes Deployments are great for stateless apps, but they have problems for databases:

Pods get random names - postgres-7d8f9c-xk2p9 changes on every restart
No stable network identity - IP addresses change
No ordered deployment - Pods start in random order
Storage is ephemeral - Data disappears when pod dies

StatefulSets solve all of these:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:10.5
        ports:
        - containerPort: 5432
          name: postgres
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 100Gi

This creates pods named postgres-0, postgres-1, postgres-2 with stable identities.

Persistent Storage Setup

The biggest challenge was storage. We’re running on AWS, so I used EBS volumes via the AWS EBS CSI driver.

First, install the CSI driver:

kubectl apply -k "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/?ref=master"

Create a StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ebs
provisioner: ebs.csi.aws.com
parameters:
  type: gp2
  fsType: ext4
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

The volumeBindingMode: WaitForFirstConsumer is crucial - it ensures the EBS volume is created in the same availability zone as the pod.

Then reference it in the StatefulSet:

volumeClaimTemplates:
- metadata:
    name: data
  spec:
    accessModes: [ "ReadWriteOnce" ]
    storageClassName: fast-ebs
    resources:
      requests:
        storage: 100Gi

Each pod gets its own PersistentVolumeClaim, which creates a dedicated EBS volume.

Networking and Service Discovery

StatefulSets create a headless service for stable network identities:

apiVersion: v1
kind: Service
metadata:
  name: postgres
spec:
  clusterIP: None  # Headless service
  selector:
    app: postgres
  ports:
  - port: 5432
    name: postgres

Now each pod is accessible via DNS:

postgres-0.postgres.default.svc.cluster.local
postgres-1.postgres.default.svc.cluster.local
postgres-2.postgres.default.svc.cluster.local

This is perfect for database replication where you need to address specific instances.

PostgreSQL Replication Setup

I set up streaming replication with one master and two replicas. The tricky part is initializing replicas from the master.

I used an init container to handle this:

initContainers:
- name: init-postgres
  image: postgres:10.5
  command:
  - bash
  - "-c"
  - |
    set -ex
    # If data directory exists, skip initialization
    [[ -d /var/lib/postgresql/data/pgdata ]] && exit 0
    
    # postgres-0 is the master
    if [[ $HOSTNAME == "postgres-0" ]]; then
      echo "Initializing master"
      exit 0
    fi
    
    # Replicas: clone from master
    echo "Cloning from master"
    until pg_basebackup -h postgres-0.postgres -D /var/lib/postgresql/data/pgdata -U replication -v -P
    do
      echo "Waiting for master..."
      sleep 5
    done
  volumeMounts:
  - name: data
    mountPath: /var/lib/postgresql/data

The main container then starts with appropriate configuration:

containers:
- name: postgres
  image: postgres:10.5
  env:
  - name: POSTGRES_PASSWORD
    valueFrom:
      secretKeyRef:
        name: postgres-secret
        key: password
  - name: PGDATA
    value: /var/lib/postgresql/data/pgdata
  command:
  - bash
  - "-c"
  - |
    set -ex
    # Master configuration
    if [[ $HOSTNAME == "postgres-0" ]]; then
      echo "Starting as master"
      cat >> /var/lib/postgresql/data/pgdata/postgresql.conf <<EOF
    wal_level = replica
    max_wal_senders = 3
    wal_keep_segments = 8
    EOF
      cat >> /var/lib/postgresql/data/pgdata/pg_hba.conf <<EOF
    host replication replication 0.0.0.0/0 md5
    EOF
    else
      # Replica configuration
      echo "Starting as replica"
      cat > /var/lib/postgresql/data/pgdata/recovery.conf <<EOF
    standby_mode = on
    primary_conninfo = 'host=postgres-0.postgres port=5432 user=replication password=$POSTGRES_PASSWORD'
    trigger_file = '/tmp/promote'
    EOF
    fi
    
    exec docker-entrypoint.sh postgres

This setup gives us:

Automatic failover capability (promote replica by creating /tmp/promote)
Read replicas for scaling reads
Data redundancy

Redis Cluster

For Redis, I used a different approach - Redis Cluster mode:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis
spec:
  serviceName: redis
  replicas: 6
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:5.0-alpine
        command:
        - redis-server
        - --cluster-enabled
        - "yes"
        - --cluster-config-file
        - /data/nodes.conf
        - --cluster-node-timeout
        - "5000"
        - --appendonly
        - "yes"
        ports:
        - containerPort: 6379
          name: client
        - containerPort: 16379
          name: gossip
        volumeMounts:
        - name: data
          mountPath: /data
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: fast-ebs
      resources:
        requests:
          storage: 10Gi

After deploying, initialize the cluster:

kubectl exec -it redis-0 -- redis-cli --cluster create \
  redis-0.redis:6379 \
  redis-1.redis:6379 \
  redis-2.redis:6379 \
  redis-3.redis:6379 \
  redis-4.redis:6379 \
  redis-5.redis:6379 \
  --cluster-replicas 1

This creates a 3-master, 3-replica cluster with automatic sharding.

Backup Strategy

Running databases in Kubernetes doesn’t mean you can skip backups. I set up automated backups using CronJobs.

PostgreSQL backup:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: postgres-backup
spec:
  schedule: "0 2 * * *"  # 2 AM daily
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: postgres:10.5
            command:
            - bash
            - "-c"
            - |
              BACKUP_FILE="/backup/postgres-$(date +%Y%m%d-%H%M%S).sql.gz"
              pg_dump -h postgres-0.postgres -U postgres | gzip > $BACKUP_FILE
              
              # Upload to S3
              aws s3 cp $BACKUP_FILE s3://my-backups/postgres/
              
              # Keep only last 7 days locally
              find /backup -name "postgres-*.sql.gz" -mtime +7 -delete
            env:
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-secret
                  key: password
            volumeMounts:
            - name: backup
              mountPath: /backup
          volumes:
          - name: backup
            persistentVolumeClaim:
              claimName: backup-pvc
          restartPolicy: OnFailure

Monitoring and Alerts

I use Prometheus to monitor the databases. For PostgreSQL, I deployed the postgres_exporter:

- name: exporter
  image: wrouesnel/postgres_exporter:latest
  env:
  - name: DATA_SOURCE_NAME
    value: "postgresql://postgres:password@localhost:5432/postgres?sslmode=disable"
  ports:
  - containerPort: 9187
    name: metrics

Key metrics I monitor:

Connection count
Replication lag
Disk usage
Query performance

Alerting rules:

groups:
- name: postgres
  rules:
  - alert: PostgresDown
    expr: pg_up == 0
    for: 1m
    annotations:
      summary: "PostgreSQL is down"
  
  - alert: ReplicationLag
    expr: pg_replication_lag > 10
    for: 5m
    annotations:
      summary: "Replication lag is {{ $value }} seconds"

Lessons Learned

After three months in production:

What worked well:

Stable identities - No more IP address chasing
Automated provisioning - New environments spin up in minutes
Resource limits - Better resource utilization than VMs
Backup automation - CronJobs make this trivial

What was challenging:

Initial setup complexity - Took 2 weeks to get right
Storage performance - EBS IOPS limits required tuning
Disaster recovery - Restoring from backup is slower than VMs
Debugging - Logs are scattered across pods

What I’d do differently:

Use an operator - Look at operators like Zalando’s postgres-operator
Test failover thoroughly - We had issues during first real failover
Monitor storage more closely - We hit IOPS limits unexpectedly
Document runbooks - Recovery procedures are different from VMs

Should You Run Databases on Kubernetes?

Honestly, it depends:

Yes, if:

You need rapid provisioning of database instances
You want consistent deployment across environments
Your team is comfortable with Kubernetes
You have good monitoring and backup strategies

No, if:

You need maximum performance (bare metal is still faster)
You have a small team without Kubernetes expertise
You’re running massive databases (multi-TB)
You can’t afford any downtime during learning curve

For us, it’s been worth it. The operational benefits outweigh the complexity. But we spent significant time getting it right, and we still keep critical production data on managed RDS as a safety net.

Conclusion

StatefulSets make running databases on Kubernetes viable, but not trivial. You need to understand storage, networking, and database replication deeply.

If you’re considering this, start small. Run a non-critical database first, test failover scenarios thoroughly, and have a rollback plan. Don’t migrate your production database on a Friday afternoon (I learned this the hard way).

The future is probably Kubernetes operators that handle all this complexity for you. But understanding StatefulSets is still valuable - it’s the foundation everything else builds on.

Table of Contents