Running Redis in production

2014-11-11 00:00:00 -0800

Overview

Redis is an excellent key/value cache that is used across many of Shokunin's customers. While redis is an great piece of software it is often difficult to obtain information about actually running it in production from an operational perspective. This article aims to discuss the necessary steps that ops teams should take before running redis in a production environment.

OS Tuning

Receive Packet Steering (RPS) / CPU Preferences

Redis is mostly single threaded application. To ensure that redis is not running on the same CPU as those handling any network traffic, it is highly recommended that RPS is enabled.

To enable RPS on CPUs 0-1:

echo '3' > /sys/class/net/eth1/queues/rx-0/rps_cpus

Redhat has a detailed guide on RPS.

To set the CPU affinity for redis to CPUs 2-8

# config is set to write pid to /var/run/redis.pid
$ taskset -pc 2-8 `cat /var/run/redis.pid`
pid 8946's current affinity list: 0-8
pid 8946's new affinity list: 2-8

Below is an example of the performance boost from a stock compile on a temporary host

RPS Status	Get Operations/second	Set Operations/second
Off	761.27	777.60
On	833.89	858.74

Tuning the kernel network stack

To ensure that redis handles the large number of connections in a high performance environment tuning the following kernel parameters is recommended.

vm.swappiness=0                       # turn off swapping
net.ipv4.tcp_sack=1                   # enable selective acknowledgements
net.ipv4.tcp_timestamps=1             # needed for selective acknowledgements
net.ipv4.tcp_window_scaling=1         # scale the network window
net.ipv4.tcp_congestion_control=cubic # better congestion algorythm
net.ipv4.tcp_syncookies=1             # enable syn cookied
net.ipv4.tcp_tw_recycle=1             # recycle sockets quickly
net.ipv4.tcp_max_syn_backlog=NUMBER   # backlog setting
net.core.somaxconn=NUMBER             # up the number of connections per port
net.core.rmem_max=NUMBER              # up the receive buffer size
net.core.wmem_max=NUMBER              # up the buffer size for all connections

Tuning the kernel memory

Under high performance conditions we noticed the occasional blip in performance due to memory allocation. It turns out this was a known issue with transparent hugepages.

echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled

Set file descriptor limits for the redis user

If you have not set the correct number of file descriptors for the redis user, you may see the following log lines:

[7842] 13 Nov 07:24:14.514 # You requested maxclients of 10000 requiring at least 10032 max file descriptors.
[7842] 13 Nov 07:24:14.514 # Redis can't set maximum open files to 10032 because of OS error: Operation not permitted.
[7842] 13 Nov 07:24:14.514 # Current maximum open files is 1024. maxclients has been reduced to 4064 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'.

Redis Tuning

Disable saving redis to disk in redis.conf

Redis will attempt to persist the data to disk. While redis forks for this process, it still slows everything down.
Comment out the lines that start with save

#save 900 1
#save 300 10
#save 60 10000

If you need to persist the data, run a slave and use that to persist data as it will cause less of a slowdown.

Set tcp-backlog in redis.conf

Newer versions of redis have their own backlog set to 511 and you will need this to be higher if you have many connections

# TCP listen() backlog.
# In high requests-per-second environments you need an high backlog in order
# make sure to raise both the value of somaxconn and tcp_max_syn_backlog
tcp-backlog 65536

set slave configs

# serve stale data if the sync is not complete
slave-serve-stale-data yes
# stop yourself from accidentally writing to the slave
slave-read-only yes

set maxclients

The default is 10000 and if you have many connections you may need to go higher.

# Once the limit is reached Redis will close all the new connections sending
# an error 'max number of clients reached'.
maxclients 10000

memory usage

By default redis is set to suck up all available memory on the box. We like to set this to 80% of the system memory using a facter fact. When running several instances of redis on a single machine this should be tuned down.

This setting can be changed on a running process.

# memory size in bytes
maxmemory 1288490188

multiple instances

Due the the mostly single threaded nature of redis, it is often beneficial to run more than one instance per box. In this case be sure to set the CPU affinity separately for each instance.

Using Twemproxy/Nutcracker in front of several instances is an common way to spread out the keys by using consistent hashing.

Redis Monitoring

Availability

The redis server will respond to the PING command when running properly

$ redis-cli -h redis.example.com -p 6379 PING
PONG

Memory usage

Since we generally recommend setting the maxmemory size, it is possible to calculate the percentage of memory in use and alert based on result

$ redis-cli INFO |grep used_memory:
used_memory:424992
$ redis-cli config get maxmemory
1) "maxmemory"
2) "10000000"

Uptime

Alert if uptime is less than you expect

$ redis-cli INFO |grep uptime_in_seconds
uptime_in_seconds:86514

Redis statistics

Cache hit rate

This information can be calculated from the INFO command

$ redis-cli INFO stats |grep keyspace
keyspace_hits:1920
keyspace_misses:930

Key eviction and expiration

Eviction occurs when redis has reached its maximum memory and maxmemory-policy in redis.conf is set to something other than volatile-lru.

$ redis-cli INFO stats |grep evicted_keys
evicted_keys:11582

Keys in redis can be set with a time to live which is generally a good practice

$ redis-cli SET mykey myvalue EX 600

It is a good idea to keep an eye on the expirations to make sure redis is performing as expected

$ redis-cli INFO stats |grep expired_keys
expired_keys:15436

Key Space

We also recommend graphing the size of the keyspace as a quick drop or spike in the number of keys is a good indicator of issues.

$ redis-cli INFO keyspace
# Keyspace
db0:keys=1075,expires=1075,avg_ttl=2110

Workload statistics

The final two stats that we recommend graphing are the following that indicate the workload place on the redis server

$ ~/tmp/redis-2.8.17/src/redis-cli INFO stats |egrep "^total_"
total_connections_received:70725
total_commands_processed:70723

Conclusion

Redis is extremely useful in many types of environments, but can be a little daunting at first. By following the best practices, it can perform as a rock solid caching layer to speed up your application.

Blog Devops notes