Our single Redis instance was the bottleneck. 100K requests/sec, 90% CPU, response time degrading. We needed to scale.

I set up Redis Cluster with 6 nodes. Now we handle 1M+ requests/sec, automatic failover, and zero downtime during node failures.

Table of Contents

The Problem

Single Redis instance:

  • 100K requests/sec (maxed out)
  • 90% CPU usage
  • 16GB memory limit
  • Single point of failure
  • No horizontal scaling

We hit the wall.

Redis Cluster Overview

Features:

  • Sharding: Data split across nodes
  • Replication: Each master has replicas
  • Automatic failover: Replica promotes to master
  • No single point of failure

Minimum: 6 nodes (3 masters + 3 replicas)

Installing Redis

Redis 5.0.5:

wget http://download.redis.io/releases/redis-5.0.5.tar.gz
tar xzf redis-5.0.5.tar.gz
cd redis-5.0.5
make
sudo make install

Cluster Configuration

Create 6 config files:

redis-7000.conf:

port 7000
cluster-enabled yes
cluster-config-file nodes-7000.conf
cluster-node-timeout 5000
appendonly yes
dir /var/lib/redis/7000

Repeat for ports 7001-7005.

Starting Nodes

redis-server /etc/redis/redis-7000.conf &
redis-server /etc/redis/redis-7001.conf &
redis-server /etc/redis/redis-7002.conf &
redis-server /etc/redis/redis-7003.conf &
redis-server /etc/redis/redis-7004.conf &
redis-server /etc/redis/redis-7005.conf &

Creating Cluster

redis-cli --cluster create \
  127.0.0.1:7000 \
  127.0.0.1:7001 \
  127.0.0.1:7002 \
  127.0.0.1:7003 \
  127.0.0.1:7004 \
  127.0.0.1:7005 \
  --cluster-replicas 1

Output:

>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 127.0.0.1:7004 to 127.0.0.1:7000
Adding replica 127.0.0.1:7005 to 127.0.0.1:7001
Adding replica 127.0.0.1:7003 to 127.0.0.1:7002

3 masters, 3 replicas!

Hash Slots

16384 slots total, divided among masters:

  • Master 1: slots 0-5460
  • Master 2: slots 5461-10922
  • Master 3: slots 10923-16383

Key hashing:

HASH_SLOT = CRC16(key) mod 16384

Connecting to Cluster

redis-cli -c -p 7000

-c enables cluster mode (follows redirects).

127.0.0.1:7000> SET user:1000 "John"
-> Redirected to slot [11143] located at 127.0.0.1:7002
OK

127.0.0.1:7002> GET user:1000
"John"

Python Client

from rediscluster import RedisCluster

startup_nodes = [
    {"host": "127.0.0.1", "port": "7000"},
    {"host": "127.0.0.1", "port": "7001"},
    {"host": "127.0.0.1", "port": "7002"},
]

rc = RedisCluster(startup_nodes=startup_nodes, decode_responses=True)

rc.set("user:1000", "John")
print(rc.get("user:1000"))  # John

Client handles redirects automatically!

Hash Tags

Force keys to same slot:

# Different slots
rc.set("user:1000", "John")
rc.set("orders:1000", "Order1")

# Same slot (using hash tag)
rc.set("user:{1000}", "John")
rc.set("orders:{1000}", "Order1")

{1000} is the hash tag. Both keys go to same slot.

Enables multi-key operations:

rc.mget("user:{1000}", "orders:{1000}")

Replication

Each master has replica:

Master 7000 -> Replica 7004
Master 7001 -> Replica 7005
Master 7002 -> Replica 7003

Replicas sync from masters automatically.

Automatic Failover

Simulate master failure:

redis-cli -p 7000 DEBUG SEGFAULT

Cluster detects failure and promotes replica:

redis-cli -p 7001 CLUSTER NODES

Output:

7004... master - 0 1554710400000 7 connected 0-5460
7000... master,fail - 1554710395000 1 disconnected

Replica 7004 promoted to master!

Adding Nodes

Add new master:

redis-server /etc/redis/redis-7006.conf &

redis-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7000

Rebalance slots:

redis-cli --cluster rebalance 127.0.0.1:7000

Add replica:

redis-server /etc/redis/redis-7007.conf &

redis-cli --cluster add-node 127.0.0.1:7007 127.0.0.1:7000 \
  --cluster-slave \
  --cluster-master-id <master-node-id>

Removing Nodes

Remove replica:

redis-cli --cluster del-node 127.0.0.1:7000 <node-id>

Remove master (resharding required):

# Reshard slots to other masters
redis-cli --cluster reshard 127.0.0.1:7000

# Then remove
redis-cli --cluster del-node 127.0.0.1:7000 <node-id>

Monitoring

Cluster info:

redis-cli -p 7000 CLUSTER INFO

Output:

cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3

Node info:

redis-cli -p 7000 CLUSTER NODES

Production Setup

6 servers (3 masters + 3 replicas):

Server 1 (Master):

# /etc/redis/redis.conf
port 6379
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
maxmemory 8gb
maxmemory-policy allkeys-lru
bind 0.0.0.0
protected-mode no
requirepass your_password

Server 2 (Replica of Server 1):

# Same config, different server

Create cluster:

redis-cli --cluster create \
  server1:6379 \
  server2:6379 \
  server3:6379 \
  server4:6379 \
  server5:6379 \
  server6:6379 \
  --cluster-replicas 1 \
  -a your_password

Client Configuration

from rediscluster import RedisCluster

startup_nodes = [
    {"host": "server1", "port": "6379"},
    {"host": "server2", "port": "6379"},
    {"host": "server3", "port": "6379"},
]

rc = RedisCluster(
    startup_nodes=startup_nodes,
    decode_responses=True,
    password="your_password",
    skip_full_coverage_check=True,
    max_connections_per_node=50
)

Monitoring with Prometheus

Redis exporter:

docker run -d \
  --name redis-exporter \
  -p 9121:9121 \
  oliver006/redis_exporter \
  --redis.addr=redis://server1:6379

Prometheus config:

scrape_configs:
  - job_name: 'redis'
    static_configs:
      - targets:
        - server1:9121
        - server2:9121
        - server3:9121

Backup Strategy

RDB snapshots:

save 900 1
save 300 10
save 60 10000

AOF persistence:

appendonly yes
appendfsync everysec

Backup script:

#!/bin/bash
for port in 7000 7001 7002; do
  redis-cli -p $port BGSAVE
  sleep 60
  cp /var/lib/redis/$port/dump.rdb /backup/redis-$port-$(date +%Y%m%d).rdb
done

Performance Tuning

Kernel settings:

# /etc/sysctl.conf
vm.overcommit_memory = 1
net.core.somaxconn = 65535

Redis config:

tcp-backlog 511
timeout 0
tcp-keepalive 300
maxclients 10000

Results

Before (single instance):

  • 100K requests/sec
  • 90% CPU
  • 16GB memory limit
  • Single point of failure

After (cluster):

  • 1M+ requests/sec (10x)
  • 30% CPU per node
  • 48GB total memory (3x16GB)
  • Automatic failover

Lessons Learned

  1. Plan capacity - 3 masters minimum
  2. Use hash tags - For multi-key operations
  3. Monitor closely - Watch for slot migrations
  4. Test failover - Before production
  5. Backup regularly - RDB + AOF

Conclusion

Redis Cluster provides horizontal scaling and high availability. Essential for high-traffic applications.

Key takeaways:

  1. Sharding across multiple masters
  2. Replication for high availability
  3. Automatic failover
  4. Hash tags for multi-key ops
  5. Monitor and backup

Scale Redis properly. Your application will thank you.