We lost database data when a pod restarted. Turns out we were using emptyDir volumes. Data gone, customers angry, lesson learned.

I implemented proper persistent volumes. Now our databases survive pod restarts, node failures, and cluster upgrades. Zero data loss in 6 months.

Table of Contents

The Problem

Our first attempt at running PostgreSQL in Kubernetes:

apiVersion: v1
kind: Pod
metadata:
  name: postgres
spec:
  containers:
  - name: postgres
    image: postgres:10
    volumeMounts:
    - name: data
      mountPath: /var/lib/postgresql/data
  volumes:
  - name: data
    emptyDir: {}  # BAD! Data lost on pod restart

Pod restarted → data gone → disaster.

Persistent Volumes (PV)

Cluster-level storage resource:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: postgres-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: manual
  hostPath:
    path: /mnt/data/postgres

Access modes:

  • ReadWriteOnce (RWO): Single node read-write
  • ReadOnlyMany (ROX): Multiple nodes read-only
  • ReadWriteMany (RWX): Multiple nodes read-write

Persistent Volume Claims (PVC)

Request for storage:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: manual

Kubernetes binds PVC to matching PV.

Using PVC in Pod

apiVersion: v1
kind: Pod
metadata:
  name: postgres
spec:
  containers:
  - name: postgres
    image: postgres:10
    env:
    - name: POSTGRES_PASSWORD
      value: password
    volumeMounts:
    - name: postgres-storage
      mountPath: /var/lib/postgresql/data
  volumes:
  - name: postgres-storage
    persistentVolumeClaim:
      claimName: postgres-pvc

Data persists across pod restarts!

StorageClass

Dynamic provisioning:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
  fsType: ext4
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

PVC with StorageClass:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 20Gi

PV created automatically!

StatefulSet with PVC

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:10
        ports:
        - containerPort: 5432
        volumeMounts:
        - name: postgres-storage
          mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
  - metadata:
      name: postgres-storage
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 20Gi

Each pod gets its own PVC!

NFS Storage

Shared storage across nodes:

NFS Server Setup:

# On NFS server
sudo apt-get install nfs-kernel-server
sudo mkdir -p /mnt/nfs_share
sudo chown nobody:nogroup /mnt/nfs_share
echo "/mnt/nfs_share *(rw,sync,no_subtree_check,no_root_squash)" | sudo tee -a /etc/exports
sudo exportfs -a
sudo systemctl restart nfs-kernel-server

PV with NFS:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: 192.168.1.100
    path: /mnt/nfs_share
  persistentVolumeReclaimPolicy: Retain

Multiple pods can mount RWX!

Reclaim Policies

Retain: Keep data after PVC deleted

persistentVolumeReclaimPolicy: Retain

Delete: Delete PV and data

persistentVolumeReclaimPolicy: Delete

Recycle: Scrub data (deprecated)

Expanding Volumes

Enable expansion:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: expandable
provisioner: kubernetes.io/aws-ebs
allowVolumeExpansion: true

Expand PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
spec:
  resources:
    requests:
      storage: 50Gi  # Increased from 20Gi
kubectl patch pvc postgres-pvc -p '{"spec":{"resources":{"requests":{"storage":"50Gi"}}}}'

Backup and Restore

Backup PV data:

# Create backup pod
kubectl run backup --rm -i --tty --image=ubuntu -- bash

# Inside pod
apt-get update && apt-get install -y rsync
rsync -av /mnt/data/ /mnt/backup/

Or use Velero:

velero backup create postgres-backup --include-namespaces=default
velero restore create --from-backup postgres-backup

Monitoring Storage

Check PV/PVC status:

kubectl get pv
kubectl get pvc
kubectl describe pv postgres-pv
kubectl describe pvc postgres-pvc

Monitor usage:

kubectl exec -it postgres-0 -- df -h /var/lib/postgresql/data

Prometheus metrics:

kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes

Production PostgreSQL Example

Complete setup:

apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-config
data:
  POSTGRES_DB: production
  POSTGRES_USER: admin
---
apiVersion: v1
kind: Secret
metadata:
  name: postgres-secret
type: Opaque
data:
  POSTGRES_PASSWORD: cGFzc3dvcmQ=  # base64 encoded
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: postgres-storage
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
  fsType: ext4
allowVolumeExpansion: true
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:10.4
        ports:
        - containerPort: 5432
          name: postgres
        envFrom:
        - configMapRef:
            name: postgres-config
        - secretRef:
            name: postgres-secret
        volumeMounts:
        - name: postgres-storage
          mountPath: /var/lib/postgresql/data
          subPath: postgres
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
  volumeClaimTemplates:
  - metadata:
      name: postgres-storage
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: postgres-storage
      resources:
        requests:
          storage: 20Gi
---
apiVersion: v1
kind: Service
metadata:
  name: postgres
spec:
  selector:
    app: postgres
  ports:
  - port: 5432
    targetPort: 5432
  clusterIP: None  # Headless service

Common Issues

1. PVC stuck in Pending:

kubectl describe pvc postgres-pvc
# Check: No PV matches, or StorageClass missing

2. Pod can’t mount volume:

kubectl describe pod postgres-0
# Check: Volume already mounted on another node (RWO)

3. Out of disk space:

kubectl exec postgres-0 -- df -h
# Expand PVC or clean up data

Results

Before:

  • emptyDir volumes
  • Data lost on pod restart
  • Manual backups
  • No persistence

After:

  • Persistent volumes
  • Data survives restarts
  • Automated backups
  • 100% data retention

Lessons Learned

  1. Never use emptyDir for data - Always use PV/PVC
  2. Use StorageClass - Dynamic provisioning is easier
  3. Set reclaim policy carefully - Retain for production
  4. Monitor disk usage - Set up alerts
  5. Test backups - Regularly restore from backup

Conclusion

Persistent volumes are essential for stateful applications in Kubernetes. Don’t learn this lesson the hard way.

Key takeaways:

  1. Use PV/PVC for persistent data
  2. StorageClass for dynamic provisioning
  3. StatefulSet for stateful apps
  4. Monitor storage usage
  5. Test backup and restore

Your data is precious. Protect it with proper persistent volumes.