Large Key And Hot Key
"Large Key" and "Hot Key" are important concepts in caching, especially when dealing with distributed caches or caching systems handling substantial data. They both present challenges that can significantly impact cache performance and efficiency.
Definition what is large key and hot key
Large Key
A large key refers to a cache key (the identifier used to retrieve a value from the cache) whose associated value is significantly larger than the average value size in the cache. The "large" designation is relative to the overall cache size and the typical value size. It is usually determined by the size of the key and the number of members in the key, for example:
- The amount of data in the key itself is too large: a String type key, its value is 5 MB.
- Too many members in the key: A ZSET type key has 10,000 members.
- The amount of data in the key members is too large: A hash type key has only 1,000 members, but the total size of the values of these members is 100 MB.
Hot Key
A hot key refers to a cache key that is accessed very frequently compared to other keys in the cache. This high access rate can create a performance bottleneck.
It is usually determined by the frequency with which the key it receives is requested, for example:
- QPS is concentrated on a specific key: The total QPS (query rate per second) of the Redis instance is 10,000, while the number of accesses per second for one key reaches 7,000.
- Bandwidth usage is concentrated on a specific key: a large number of HGETALL operation requests are sent per second for a HASH key with thousands of members and a total size of 1 MB.
- The proportion of CPU usage time is concentrated in a specific key: a large number of ZRANGE operation requests are sent per second for a key (ZSET type) with tens of thousands of members .
Reasons for large keys and hot keys
Improper use of Redis, insufficient business planning, accumulation of invalid data, and sudden increase in traffic will all generate large and hot keys, such as:
When Big key
- Insufficient planning and design before the service is launched, and the members in the key are not split reasonably, resulting in too many members in some keys;
- Invalid data is not cleaned up regularly, resulting in the continuous increase of members in the HASH type key;
- A code failure occurred on the business consumer side that used LIST type keys, causing the members of the corresponding keys to only increase but not decrease.
When Hot key
- Unexpected surges in traffic, such as the sudden emergence of hot-selling products, a surge in hot news traffic, a large number of likes from an event organized by a host 2. in the live broadcast room, a battle between multiple guilds in a certain area of the game involving a large number of players, etc.
Problem Caused by Large Key And Hot Key
Problem On Big Key
- The time it takes for the client to execute commands becomes slower, and in extreme cases the time consumption may soar to the s level.
- When the Redis memory reaches the upper limit defined by the maxmemory parameter, operations may be blocked or important keys may be evicted, or even memory overflow may occur.
- In a cluster architecture, the memory usage of a certain data shard far exceeds that of other data shards, making it impossible to balance the memory resources of the data shards.
- Executing read requests for large keys will fully utilize the bandwidth of the Redis instance, causing its own service to slow down and easily affecting related services.
- Deleting a large key can easily cause the master database to be blocked for a long time, which may lead to synchronization interruption or master-slave switching.
- Because of the existence of large keys, daily changes, fault replacements, and cluster changes on the platform side may affect the time consumption of business-side requests and affect cluster stability.
Problem On Hot Key
- It takes up a lot of CPU resources, affects other requests and causes overall performance degradation.
- Under the cluster architecture, access skew occurs, that is, a certain data shard is accessed in large quantities while other data shards are idle, which may cause the number of connections for the data shard to be exhausted and new connection establishment requests to be rejected.
- In a rush sale or flash sale scenario, the number of requests for the inventory key corresponding to the product may be too large, exceeding the processing capacity of Redis, resulting in overselling.
- If the number of hot key requests exceeds Redis's tolerance, it is easy to cause cache breakdown, that is, a large number of requests will be directly directed to the backend storage layer, resulting in a surge in storage access or even downtime, thus affecting other businesses.
How To Identify Large And Hot Key
Generally, it is common that the time it takes for business to access the cache increases, and the cachecloud monitoring panel shows a single or multiple instance load or time concentration. By initiating hot key or big key analysis in space, determine whether it is a big key or hot key based on the analysis results. The business side can also estimate the hot key situation based on business experience, and use other open source tools for analysis and determination.
Big key and Hot Key solutions
Big Key Mitigation Strategies
For the large key platform, it is recommended to split the business into multiple small keys to reduce the impact of large key read and write operations on traffic and load.
- When the value is a collection type such as list/set, sharding is performed based on the estimated data size, and different elements are divided into different shards after calculation.
- When value is a string, it is difficult to split. In this case, serialization and compression algorithms are used to control the size of the key within a reasonable range. However, both serialization and deserialization will consume more time.
- When the value is a string and is still a large key after compression, it needs to be split. A large key is divided into different parts, and the key of each part is recorded. Operations such as multiget are used to implement transaction reading.
Hot Key Mitigation Strategies
- Distribute hot key traffic to multiple nodes:
- Change the cluster Read Mode to Master and Slave, so that the slave can share half of the node load caused by multiple read requests.
- Manually copy the key to multiple keys with different names, calculate the slot ( mod 16384 after crc16 of the key) to ensure that they are distributed to different nodes, and forward them to different nodes with different key names when requests are made. (The effect is affected by the later expansion and contraction of the cluster)
- Multi-level cache key
- The cachecloud proxy layer enables read cache. Within the statistical period of 10 seconds, the average qps of read requests greater than 5000/s is taken, and the largest 100 are saved. The cache is refreshed every two seconds ( data update will have a 2-second delay ). For details, see the document Cache
- The business-side service enables local cache to cache hot keys, and can design cache statistics method and cache duration by itself.