The strengths and weaknesses of using Redis as the JuiceFS metadata engine

Changjian Gao 2022.07.22

About author:

Changjian Gao is a technical experts at Juicedata with 10+ years of experience in the technology industry. He has worked as an architect at Zhihu, Jike and Xiaohongshu, with specialization in technical research spanning distributed systems, big data, and AI.

Overview

Redis is an open-source, in-memory, key-value data store, initially released in 2009. It is the most popular key-value store today according to the DB-Engines Ranking. Because of its fast performance, good user experience, active developer community and well-developed ecosystem, Redis is the first metadata engine used in JuiceFS when JuiceFS was open sourced in 2021, and it has now become one of the most widely used metadata engines in the JuiceFS community.

As a database with a low entry barrier, Redis is particularly suitable for use with a quick start with JuiceFS. However, shortcomings of traditional Redis architecture may come out when the data scale grows up, or JuiceFS is deployed in the production environment where there is high demand for service availability and data reliability. The following of this article will go through the advantages and disadvantages of different Redis architectures, attempting to help readers better understand using Redis as the JuiceFS metadata engine.

Strengths and weaknesses of using Redis as the metadata engine

Redis data resides in memory, which enables a fast response to the request of the file system metadata. Also, JuiceFS takes full advantage of the Lua scripting features provided by Redis, which highly improves the performance of the core metadata operations such as lookup. Compared to MySQL, TiKV, and etcd, the average request performance of Redis is 2-4 times faster (see Metadata Engines Benchmark). Thus, Redis is particularly suitable for application scenarios that require extreme metadata request performance.

The other side of the in-memory store of Redis is that the storage capacity of individual services is not unlimited. Based on our experience, one inode of metadata in the file system takes about 300 bytes of memory. That means it needs about 32 GiB of memory when storing 100 million inodes. Note that, in practice, it is not recommended to use an instance bigger than 64 GiB of memory to deploy Redis: on the one hand, the bigger the memory, the longer time the fail recovery will take; on the other hand, Redis is a single-threaded model and cannot utilize multiple cores, which may result in the waste of CPU resources in large-memory machines.

Issues and challenges of Redis’ standalone architecture

When deploying only one Redis instance, this instance will become a Single Point of Failure (SPOF) of the whole system. In that case, once the metadata engine fails, all the operations will not work. Therefore, it is generally recommended to deploy multiple replicas and Redis Sentinel to ensure service availability. Multiple replicas can ensure multiple instances are active at the same time, and the data in these instances are nearly the same (you will know why it is NEARLY the same later). Redis Sentinel constantly checks if Redis master and replica instances are working as expected. If a master crashes accidentally, Sentinel can immediately start a failover process (generally in seconds), where a replica is promoted as master, to ensure the service availability.

By default, Redis uses asynchronous replication, which can be implemented without consuming Redis performance too much. However, if the Redis master crashes, data loss may occur because some data has not yet been synchronized to the replica instance. Although Redis supports synchronous replication, it can only reduce the probability of data loss and will not make Redis become a CP system.

Therefore, how to ensure data reliability while ensuring service availability is a big challenge when using Redis, especially when the data is the metadata of the file system.

Moving forward: using Redis Cluster as the metadata engine

If you are familiar with Redis, you will probably ask why not use Redis Cluster as the metadata engine. The story behind this starts with the metadata design of the JuiceFS file system. JuiceFS metadata includes various types, e.g., file, directory, attribute, file lock, etc., and each type corresponds to a different key in Redis. However, several metadata types can be involved when executing some metadata operations, meaning that multiple Redis keys need to be modified simultaneously. To ensure the atomicity of the metadata operation of the file system, JuiceFS uses the Redis Transaction to modify several keys at the same time in order to guarantee the metadata consistency.

However, the Redis Transaction has limitations with Redis Cluster. Specifically, a Redis Cluster is divided into many hash slots, and all the data will be allocated to different slots by a hashing function. Redis Cluster supports multiple key operations as long as all of the keys involved in a single transaction belong to the same hash slot; in other words, it does not support across-hash slots transactions. As a result, Redis Cluster cannot ensure the atomicity of the metadata operations if it is used as the JuiceFS metadata engine, which is not acceptable for a file system.

Fortunately, the hash tags feature in Redis enables keys with the same hash tag to be allocated to the same hash slot, which is exactly what JuiceFS uses to assure all the keys in one file system are stored in the same hash slot, thereby ensuring the atomicity of the metadata operations. This feature has been released in JuiceFS v1.0.0 Beta3. The use of Redis Cluster in JuiceFS is not very different from standalone Redis, and thus all the commands can be used directly. Compared to standalone Redis, the use of Redis Cluster enables metadata of different JuiceFS file systems to be stored in one cluster instead of operating several Redis instances, which can remarkably reduce operation costs. However, when the file systems vary in size, Redis Cluster may not allocate the data evenly. In that case, if there is one file system that is particularly large, scaling can only be done through vertical scaling (scaling up).

Although the limitation of Redis Cluster transactions can be overcome, the above-mentioned data reliability issues still exist. The essential reason that Redis Cluster cannot provide a strong consistency guarantee is the asynchronous data replication in the Redis standalone architecture, which will inevitably cause data loss.

Step Further: use Redis with a strong consistency guarantee as the metadata engine.

Whether it is Redis or Redis Cluster, there is always a risk of data loss due to the lack of strong consistency guarantees. The Redis company (formerly Redis Labs) actually has been attempting to solve this issue, and this is how the open-source project, RedisRaft, comes with the mission of making up for this weakness of Redis (it may sacrifice some performance of course). However, RedisRaft has not been GA for a long time and it is still in the development stage.

Not only the open-source community is trying to solve this problem, but also all public cloud providers are working hard to develop and promote their own services. In August 2021, AWS released the Amazon MemoryDB for Redis (hereinafter referred to as MemoryDB) service, which is another Redis-compatible fully-managed database service after Amazon ElastiCache for Redis. As its name, MemoryDB can be used as a single, primary database rather than a cache layer, and it also has microsecond read and single-digit millisecond write performance. Most importantly, MemoryDB provides a strong consistency guarantee, which is implemented by persisting writes to a distributed transaction log to ensure that successfully written data will not be lost; meanwhile, MemoryDB keeps lots of practical features of Amazon ElastiCache for Redis, such as fast failover and automatic database backup (based on transaction log implementation).

Other public cloud providers generally also have Redis-compatible cloud services similar to MemoryDB, such as ApsaraDB for Redis Enhanced Edition (Tair) in Alibaba Cloud and GaussDB (for Redis) in Huawei Cloud. This is important to JuiceFS as these cloud services provide a strong consistency guarantee for data, which highly improves the data reliability of JuiceFS metadata.