Redis Cluster: High Availability and Horizontal Scaling
Our single Redis instance was the bottleneck. 100K requests/sec, 90% CPU, response time degrading. We needed to scale.
I set up Redis Cluster with 6 nodes. Now we handle 1M+ requests/sec, automatic failover, and zero downtime during node failures.
Table of Contents
The Problem
Single Redis instance:
- 100K requests/sec (maxed out)
- 90% CPU usage
- 16GB memory limit
- Single point of failure
- No horizontal scaling
We hit the wall.
Redis Cluster Overview
Features:
- Sharding: Data split across nodes
- Replication: Each master has replicas
- Automatic failover: Replica promotes to master
- No single point of failure
Minimum: 6 nodes (3 masters + 3 replicas)
Installing Redis
Redis 5.0.5:
wget http://download.redis.io/releases/redis-5.0.5.tar.gz
tar xzf redis-5.0.5.tar.gz
cd redis-5.0.5
make
sudo make install
Cluster Configuration
Create 6 config files:
redis-7000.conf:
port 7000
cluster-enabled yes
cluster-config-file nodes-7000.conf
cluster-node-timeout 5000
appendonly yes
dir /var/lib/redis/7000
Repeat for ports 7001-7005.
Starting Nodes
redis-server /etc/redis/redis-7000.conf &
redis-server /etc/redis/redis-7001.conf &
redis-server /etc/redis/redis-7002.conf &
redis-server /etc/redis/redis-7003.conf &
redis-server /etc/redis/redis-7004.conf &
redis-server /etc/redis/redis-7005.conf &
Creating Cluster
redis-cli --cluster create \
127.0.0.1:7000 \
127.0.0.1:7001 \
127.0.0.1:7002 \
127.0.0.1:7003 \
127.0.0.1:7004 \
127.0.0.1:7005 \
--cluster-replicas 1
Output:
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 127.0.0.1:7004 to 127.0.0.1:7000
Adding replica 127.0.0.1:7005 to 127.0.0.1:7001
Adding replica 127.0.0.1:7003 to 127.0.0.1:7002
3 masters, 3 replicas!
Hash Slots
16384 slots total, divided among masters:
- Master 1: slots 0-5460
- Master 2: slots 5461-10922
- Master 3: slots 10923-16383
Key hashing:
HASH_SLOT = CRC16(key) mod 16384
Connecting to Cluster
redis-cli -c -p 7000
-c enables cluster mode (follows redirects).
127.0.0.1:7000> SET user:1000 "John"
-> Redirected to slot [11143] located at 127.0.0.1:7002
OK
127.0.0.1:7002> GET user:1000
"John"
Python Client
from rediscluster import RedisCluster
startup_nodes = [
{"host": "127.0.0.1", "port": "7000"},
{"host": "127.0.0.1", "port": "7001"},
{"host": "127.0.0.1", "port": "7002"},
]
rc = RedisCluster(startup_nodes=startup_nodes, decode_responses=True)
rc.set("user:1000", "John")
print(rc.get("user:1000")) # John
Client handles redirects automatically!
Hash Tags
Force keys to same slot:
# Different slots
rc.set("user:1000", "John")
rc.set("orders:1000", "Order1")
# Same slot (using hash tag)
rc.set("user:{1000}", "John")
rc.set("orders:{1000}", "Order1")
{1000} is the hash tag. Both keys go to same slot.
Enables multi-key operations:
rc.mget("user:{1000}", "orders:{1000}")
Replication
Each master has replica:
Master 7000 -> Replica 7004
Master 7001 -> Replica 7005
Master 7002 -> Replica 7003
Replicas sync from masters automatically.
Automatic Failover
Simulate master failure:
redis-cli -p 7000 DEBUG SEGFAULT
Cluster detects failure and promotes replica:
redis-cli -p 7001 CLUSTER NODES
Output:
7004... master - 0 1554710400000 7 connected 0-5460
7000... master,fail - 1554710395000 1 disconnected
Replica 7004 promoted to master!
Adding Nodes
Add new master:
redis-server /etc/redis/redis-7006.conf &
redis-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7000
Rebalance slots:
redis-cli --cluster rebalance 127.0.0.1:7000
Add replica:
redis-server /etc/redis/redis-7007.conf &
redis-cli --cluster add-node 127.0.0.1:7007 127.0.0.1:7000 \
--cluster-slave \
--cluster-master-id <master-node-id>
Removing Nodes
Remove replica:
redis-cli --cluster del-node 127.0.0.1:7000 <node-id>
Remove master (resharding required):
# Reshard slots to other masters
redis-cli --cluster reshard 127.0.0.1:7000
# Then remove
redis-cli --cluster del-node 127.0.0.1:7000 <node-id>
Monitoring
Cluster info:
redis-cli -p 7000 CLUSTER INFO
Output:
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
Node info:
redis-cli -p 7000 CLUSTER NODES
Production Setup
6 servers (3 masters + 3 replicas):
Server 1 (Master):
# /etc/redis/redis.conf
port 6379
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
maxmemory 8gb
maxmemory-policy allkeys-lru
bind 0.0.0.0
protected-mode no
requirepass your_password
Server 2 (Replica of Server 1):
# Same config, different server
Create cluster:
redis-cli --cluster create \
server1:6379 \
server2:6379 \
server3:6379 \
server4:6379 \
server5:6379 \
server6:6379 \
--cluster-replicas 1 \
-a your_password
Client Configuration
from rediscluster import RedisCluster
startup_nodes = [
{"host": "server1", "port": "6379"},
{"host": "server2", "port": "6379"},
{"host": "server3", "port": "6379"},
]
rc = RedisCluster(
startup_nodes=startup_nodes,
decode_responses=True,
password="your_password",
skip_full_coverage_check=True,
max_connections_per_node=50
)
Monitoring with Prometheus
Redis exporter:
docker run -d \
--name redis-exporter \
-p 9121:9121 \
oliver006/redis_exporter \
--redis.addr=redis://server1:6379
Prometheus config:
scrape_configs:
- job_name: 'redis'
static_configs:
- targets:
- server1:9121
- server2:9121
- server3:9121
Backup Strategy
RDB snapshots:
save 900 1
save 300 10
save 60 10000
AOF persistence:
appendonly yes
appendfsync everysec
Backup script:
#!/bin/bash
for port in 7000 7001 7002; do
redis-cli -p $port BGSAVE
sleep 60
cp /var/lib/redis/$port/dump.rdb /backup/redis-$port-$(date +%Y%m%d).rdb
done
Performance Tuning
Kernel settings:
# /etc/sysctl.conf
vm.overcommit_memory = 1
net.core.somaxconn = 65535
Redis config:
tcp-backlog 511
timeout 0
tcp-keepalive 300
maxclients 10000
Results
Before (single instance):
- 100K requests/sec
- 90% CPU
- 16GB memory limit
- Single point of failure
After (cluster):
- 1M+ requests/sec (10x)
- 30% CPU per node
- 48GB total memory (3x16GB)
- Automatic failover
Lessons Learned
- Plan capacity - 3 masters minimum
- Use hash tags - For multi-key operations
- Monitor closely - Watch for slot migrations
- Test failover - Before production
- Backup regularly - RDB + AOF
Conclusion
Redis Cluster provides horizontal scaling and high availability. Essential for high-traffic applications.
Key takeaways:
- Sharding across multiple masters
- Replication for high availability
- Automatic failover
- Hash tags for multi-key ops
- Monitor and backup
Scale Redis properly. Your application will thank you.