We lost all our Redis data when the server crashed. Session data, cache, everything gone. Users were logged out, pages were slow while cache rebuilt.

I learned about Redis persistence the hard way. Now we use AOF + RDB hybrid approach. Last crash? Zero data loss.

Table of Contents

The Crash

Friday, 4 PM: Server power failure
Friday, 4:15 PM: Server back online
Friday, 4:16 PM: Redis starts with empty dataset
Friday, 4:17 PM: All users logged out, cache cold

We were using Redis with default config - no persistence. All data in memory, nothing on disk.

Redis Persistence Options

Redis offers two persistence methods:

  1. RDB (Redis Database) - Point-in-time snapshots
  2. AOF (Append Only File) - Log of every write operation

RDB Snapshots

RDB saves dataset to disk at intervals.

Default config (redis.conf):

save 900 1      # Save after 900 seconds if at least 1 key changed
save 300 10     # Save after 300 seconds if at least 10 keys changed
save 60 10000   # Save after 60 seconds if at least 10000 keys changed

Manual snapshot:

redis-cli BGSAVE

Creates dump.rdb file.

RDB Pros

  • Compact - Single file, easy to backup
  • Fast recovery - Loading RDB is faster than replaying AOF
  • Good for backups - Copy dump.rdb to backup location
  • Minimal performance impact - Fork process, parent continues serving

RDB Cons

  • Data loss - Can lose data between snapshots
  • Fork can be slow - On large datasets, fork takes time
  • Not real-time - Minutes of data loss possible

AOF (Append Only File)

AOF logs every write operation.

Enable in redis.conf:

appendonly yes
appendfilename "appendonly.aof"

AOF Sync Policies

Three options:

1. appendfsync always - Sync after every write

appendfsync always
  • Safest (no data loss)
  • Slowest (disk I/O for every write)

2. appendfsync everysec - Sync every second (default)

appendfsync everysec
  • Good balance
  • Can lose 1 second of data
  • Recommended for most cases

3. appendfsync no - Let OS decide when to sync

appendfsync no
  • Fastest
  • Can lose more data
  • Not recommended

AOF Rewrite

AOF file grows over time. Redis can rewrite it:

redis-cli BGREWRITEAOF

Or automatic:

auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

Rewrites when AOF is 100% larger than last rewrite and at least 64MB.

AOF Pros

  • Durable - Can lose at most 1 second of data
  • Append-only - No corruption from crashes
  • Readable - AOF is text file, can be edited
  • Automatic rewrite - Keeps file size manageable

AOF Cons

  • Larger files - AOF bigger than RDB
  • Slower recovery - Replaying AOF takes longer
  • Slightly slower - More disk I/O than RDB

RDB + AOF Hybrid

Best of both worlds:

# Enable both
save 900 1
save 300 10
save 60 10000

appendonly yes
appendfsync everysec

On restart:

  1. Redis loads AOF (more complete)
  2. Falls back to RDB if AOF doesn’t exist

Our Configuration

Production redis.conf:

# RDB snapshots
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /var/lib/redis

# AOF
appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes

# Memory
maxmemory 2gb
maxmemory-policy allkeys-lru

Testing Persistence

Simulate crash:

# Write some data
redis-cli SET test "hello"
redis-cli SET user:1 "john"

# Kill Redis (simulate crash)
kill -9 $(pgrep redis-server)

# Start Redis
redis-server /etc/redis/redis.conf

# Check data
redis-cli GET test
# Output: "hello"

Data survived!

Backup Strategy

Daily backups:

#!/bin/bash
# backup-redis.sh

DATE=$(date +%Y%m%d)
BACKUP_DIR=/backups/redis

# Trigger RDB snapshot
redis-cli BGSAVE

# Wait for snapshot to complete
while [ $(redis-cli LASTSAVE) -eq $LAST_SAVE ]; do
    sleep 1
done

# Copy RDB file
cp /var/lib/redis/dump.rdb $BACKUP_DIR/dump-$DATE.rdb

# Copy AOF file
cp /var/lib/redis/appendonly.aof $BACKUP_DIR/appendonly-$DATE.aof

# Keep last 7 days
find $BACKUP_DIR -name "dump-*.rdb" -mtime +7 -delete
find $BACKUP_DIR -name "appendonly-*.aof" -mtime +7 -delete

Cron job:

0 2 * * * /usr/local/bin/backup-redis.sh

Monitoring Persistence

Check last save time:

redis-cli LASTSAVE

Check if save is in progress:

redis-cli INFO persistence

Output:

# Persistence
loading:0
rdb_changes_since_last_save:42
rdb_bgsave_in_progress:0
rdb_last_save_time:1481385600
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:1
aof_enabled:1
aof_rewrite_in_progress:0
aof_last_rewrite_time_sec:-1
aof_current_size:1024
aof_base_size:512

Recovery from Backup

If Redis won’t start:

# Stop Redis
systemctl stop redis

# Restore from backup
cp /backups/redis/dump-20161210.rdb /var/lib/redis/dump.rdb
cp /backups/redis/appendonly-20161210.aof /var/lib/redis/appendonly.aof

# Fix permissions
chown redis:redis /var/lib/redis/*

# Start Redis
systemctl start redis

AOF Corruption

If AOF is corrupted:

# Check AOF
redis-check-aof appendonly.aof

# Fix AOF (removes corrupted part)
redis-check-aof --fix appendonly.aof

Performance Impact

Measured on our server (2GB dataset):

ConfigWrite Ops/secSave Time
No persistence85,000N/A
RDB only82,0002.3s
AOF (everysec)78,000N/A
RDB + AOF76,0002.5s

AOF reduces throughput by ~10%, but worth it for durability.

When to Use What

RDB only:

  • Cache that can be rebuilt
  • Data loss acceptable
  • Fast recovery needed

AOF only:

  • Critical data
  • Can’t afford data loss
  • Recovery time not critical

RDB + AOF (recommended):

  • Production systems
  • Best balance of safety and performance
  • Our choice

Lessons Learned

  1. Always enable persistence - Unless data is truly disposable
  2. Use AOF for critical data - Can’t afford to lose sessions
  3. Test recovery - Simulate crashes, verify data survives
  4. Monitor persistence - Check LASTSAVE, INFO persistence
  5. Backup regularly - Copy RDB/AOF files offsite

Results

Before:

  • No persistence
  • Lost all data on crash
  • Users logged out
  • Cache rebuild took 30 minutes

After:

  • RDB + AOF enabled
  • Zero data loss on crash
  • Users stay logged in
  • Instant recovery

Conclusion

Redis persistence is essential for production. Don’t learn this lesson the hard way like we did.

Key takeaways:

  1. Enable persistence (RDB + AOF)
  2. Use appendfsync everysec for balance
  3. Backup RDB and AOF files
  4. Test recovery procedures
  5. Monitor persistence status

Redis is fast, but speed means nothing if you lose your data. Configure persistence properly.