JuiceFS 24 Q&As for beginners

2022-09-06

Zhi Jian

Introduction

JuiceFS is an innovative software. Many users might feel confused with lots of questions when they use it for the first time. To help beginners understand JuiceFS and get started with JuiceFS quickly, here we summarised 24 of the most asked questions and corresponding answers.

Q1: What can JuiceFS do?

JuiceFS is a high-performance distributed file system designed for the cloud-native environment, released under Apache License 2.0. JuiceFS is fully POSIX compatible. Most object storage can be used to store file data in JuiceFS, and users can access it just like a local disk but with massive capacity. In addition, JuiceFS clients on multiple hosts can read and write files simultaneously, even if these hosts are on different platforms or regions.

Q2: How is the performance of JuiceFS?

JuiceFS is a distributed file system. The latency of metadata is determined by 1 to 2 network round trip(s) between the mount point and metadata service (generally 1-3 ms), and the latency of data depends on the object storage latency (generally 20-100ms). The throughput of sequential read and write can reachup to 2800 MiB/s (see Benchmark with fio), depending on the network bandwidth and whether the data can be easily compressed.

JuiceFS has a built-in multi-level cache (invalidated automatically). Once the cache is warmed up, latency and throughput can be very close to a local file system (although the use of FUSE may bring a small amount of overhead).

Q3: Are there any prerequisites for running JuiceFS

Before starting JuiceFS, you need to prepare metadata engine and object storage. The metadata engine stores metadata such as file name, size, modification time, and the object storage stores file content.

JuiceFS currently supports Redis, TiKV, MySQL, and PostgreSQL, etc. as the metadata engine. Please refer to How to Set up Metadata Engine for more databases supported in JuiceFS with detailed configuration parameters.

JuiceFS supports most of object storages, such as AWS S3, Huawei Cloud OBS, Tencent Cloud COS, etc.. Additionally, to facilitate testing, local disk is also supported as the object storage. Please refer to How to Set up Object Storage for more object storages supported in JuiceFS with detailed configuration parameters.

Q4: Steps for using JuiceFS

To use JuiceFS is very easy, only consisting of two steps: formatting and mounting it locally. The following is an example of using JuiceFS with Redis to mount AWS S3 locally.

# Step 1: format a file system
juicefs format \
   --storage s3 \
   --bucket https://myjfs.s3.us-west-1.amazonaws.com \
   --access-key xxxx \
   --secret-key xxxx \
   redis://localhost:6379/1 \
   test1
# Step 2: mount the file system  in the background to the directory /tmp/jfs
juicefs mount -d redis://localhost:6379/1 /tmp/jfs

Q5: Is it possible to get started with JuiceFS quickly without Redis locally and object storage?

Absolutely! Starting a JuiceFS service only requires two components, i.e., the metadata engine and object storage, and it is possible to use the simplest ones. For example, you can use SQLite, an embedded database, as the metadata engine and the local disk as the object storage. The latter is the default setting of JuiceFS and can be enabled by leaving the `--bucket` option blank when formatting, and the default storage path for the root user is `/var/jfs`, and that for ordinary users is `~/.juicefs/local`. In this way, you can easily experience JuiceFS only with the JuiceFS binary and without any external components.

# Step 1: use SQLite as the metadata engine to format the file system named test1
juicefs format sqlite3://myjfs.db test1
# Step 2: mount the file system in the background to the directory /tmp/jfs
Juicefs mount -d sqlite3://my-jfs.db /tmp/jfs

Q6: Can I mount JuiceFS with a user other than `root`

Of course! JuiceFS can be mounted using the command `juicefs` by any user. The default directory for caching is `$HOME/.juicefs/cache` for macOS and `/var/jfsCache` for Linux. Please make sure that the user has the write permission to the directory, or switch to other directory which the user has the permission to. Please refer to Read cache in client for more information.

Q7: How compatible is JuiceFS with POSIX?

The POSIX compatibility of JuiceFS has been tested with Pjdfstest and LTP. The final results show that it passed all the test cases in pfdfstest and most of the test cases in LTP.

Q8: What other ways does JuiceFS support to access data besides mount?

JuiceFS supports the following methods in addition to mounting:

Kubernetes CSI Driver: JuiceFS can be used as the storage layer in Kubernetes clusters via Kubernetes CSI Driver (refer to Use JuiceFS on Kubernetes for details).
Hadoop Java SDK: It is convenient to use the HDFS interface-compatible Java SDK to access JuiceFS in the Hadoop ecosystem (refer to Use JuiceFS on Hadoop Ecosystem for details).
S3 Gateway: JuiceFS can be accessed via S3 REST API (refer to Deploy JuiceFS S3 Gateway for more information).
Docker Volume Plugin: there is an easy way to use JuiceFS on Docker (refer to Use JuiceFS on Docker for details).
WebDAV Gateway: JuiceFS can be accessed via the WebDAV protocol.

Q9: Does JuiceFS support Redis Sentinel or Cluster as the metadata engine?

Yes! You can check the documentation Redis Best Practices for more information.

Q10: How to evaluate the JuiceFS performance?

After mounting JuiceFS to a local directory, execute the command `juicefs bench`, which will perform read and write tests for large and small files in the directory. For example,

# /tmp/jfs is the local directory that JuiceFS is mounted at.
$ juicefs bench /tmp/jfs
Cleaning kernel cache, may ask for root privilege...
Password:
 Write big blocks count: 1024 / 1024 [==============================================================]  done
  Read big blocks count: 1024 / 1024 [==============================================================]  done
Write small blocks count: 100 / 100 [==============================================================]  done
Read small blocks count: 100 / 100 [==============================================================]  done
 Stat small files count: 100 / 100 [==============================================================]  done
Benchmark finished!
BlockSize: 1 MiB, BigFileSize: 1024 MiB, SmallFileSize: 128 KiB, SmallFileCount: 100, NumThreads: 1
+------------------+-----------------+--------------+
|       ITEM       |      VALUE      |     COST     |
+------------------+-----------------+--------------+
|   Write big file |   1236.96 MiB/s |  0.83 s/file |
|    Read big file |   2962.88 MiB/s |  0.35 s/file |
| Write small file |  2277.4 files/s | 0.44 ms/file |
|  Read small file |  2753.0 files/s | 0.36 ms/file |
|        Stat file | 16603.3 files/s | 0.06 ms/file |
+------------------+-----------------+--------------+

The `juicefs bench` command can also be used to quickly check if the JuiceFS service is running as expected as a simple test after mounting. Please refer to Performance Evaluation Guide if you have more questions about it.

Q11: How to evaluate compatibility and performance of object storage?

Object storage is a key component in JuiceFS. The correctness and performance of the object storage will directly impact the performance of the JuiceFS service. Thus, the object storage should be the first thing to check to make sure JuiceFS works smoothly. For convenience, JuiceFS provides the built-in command `juicefs objbench`, which enables a quick evaluation of the object storage correctness and performance. Here is an example:

$ juicefs objbench --storage minio  http://127.0.0.1:9000/testbucket --access-key admin --secret-key admin123
Start Functional Testing ...
+----------+---------------------+-------------+
| CATEGORY |         TEST        |    RESULT   |
+----------+---------------------+-------------+
|    basic |     create a bucket |        pass |
|    basic |       put an object |        pass |
|    basic |       get an object |        pass |
|    basic |       get non-exist |        pass |
|    basic |  get partial object |        pass |
|    basic |      head an object |        pass |
|    basic |    delete an object |        pass |
|    basic |    delete non-exist |        pass |
|    basic |        list objects |        pass |
|     sync |    put a big object |        pass |
|     sync | put an empty object |        pass |
|     sync |    multipart upload |        pass |
|     sync |  change owner/group | not support |
|     sync |   change permission | not support |
|     sync |        change mtime | not support |
+----------+---------------------+-------------+

Start Performance Testing ...
put small objects count: 100 / 100 [==============================================================]  done
get small objects count: 100 / 100 [==============================================================]  done
   upload objects count: 256 / 256 [==============================================================]  done
 download objects count: 256 / 256 [==============================================================]  done
     list objects count: 100 / 100 [==============================================================]  done
     head objects count: 100 / 100 [==============================================================]  done
   delete objects count: 100 / 100 [==============================================================]  done
Benchmark finished! block-size: 4096 KiB, big-object-size: 1024 MiB, small-object-size: 128 KiB, small-objects: 100, NumThreads: 4
+--------------------+--------------------+-----------------+
|        ITEM        |        VALUE       |       COST      |
+--------------------+--------------------+-----------------+
|     upload objects |        67.12 MiB/s | 59.59 ms/object |
|   download objects |       106.86 MiB/s | 37.43 ms/object |
|  put small objects |    508.2 objects/s |  1.97 ms/object |
|  get small objects |    728.0 objects/s |  1.37 ms/object |
|       list objects | 46890.01 objects/s |      2.13 ms/op |
|       head objects |   2861.2 objects/s |  0.35 ms/object |
|     delete objects |   2295.1 objects/s |  0.44 ms/object |
| change permissions |        not support |     not support |
| change owner/group |        not support |     not support |
|       update mtime |        not support |     not support |
+--------------------+--------------------+-----------------+

Q12: What to do if there is a `Resource busy – try ‘diskutil unmount’` error when unmounting?

The error indicates that a file or directory at this mount point is being used, and thus the mount point cannot be `umount`. If it happens, you can check if there is a terminal with its current working directory in the JuiceFS mount point, or if any application is processing files at the mount point. If so, exit the terminal or application and then try to unmount the file system with the command `juicefs umount`.

Q13: How to destroy a file system?

You can use the command `juicefs destroy` to destroy the file system. This command will delete all relevant data in the metadata engine and the object storage. Please refer to How to Destroy a File System for more information.

Q14: Where can I find the JuiceFS log?

The log will be written into log files only when JuiceFS is mounted in the background, and the foreground mount or other foreground commands will print logs directly in the terminal.

The log file on Mac is `/Users/$User/.juicefs/juicefs.log` by default.

The log file on Linus is `/var/log/juicefs.log` by default.

Q15: Why can't I find the original files stored in JuiceFS in the object storage?

When using JuiceFS, the file will be split into Chunks, Slices, and Blocks, and stored in the object storage. Hence, you may notice that the original files stored in JuiceFS cannot be found in the browser of the object storage; instead, there is only a directory named chunks and a bunch of directories and files indexed by numbers in the bucket. Don’t panic! That’s exactly what makes JuiceFS a high-performance file system. For details, please refer to How Does JuiceFS Store Files.

Q16: what is the basic principle of “random write” in JuiceFS?

JuiceFS does not store original files in object storage. Instead, JuiceFS splits each file into several data blocks (Blocks) based on a certain size (4 MiB by default), and the Blocks will be uploaded into the object storage, while the ID of the Blocks will be stored in the metadata engine. Logically, processing random write is to overwrite the original content. However, in fact, it is to mark the metadata of the Blocks to be overwritten as old data; meanwhile, only the new Blocks which are generated during random write will be uploaded to the object storage, and the corresponding metadata of the new Blocks will be updated to the metadata engine.

When reading the data that has been overwritten, it can be read from the new Blocks uploaded during random writes according to the latest metadata; meanwhile, the old Blocks may be automatically deleted by garbage collection tasks running in the background. In this way, the complexity of the random writes will be shifted to the complexity of reads.

The above is just a rough introduction to how JuiceFS implements random write. The specific read and writes in JuiceFS are very complex, and you can learn more from JuiceFS Internals and An introduction to the workflow of processing read and write.

Q17: Why does the object storage usage not change or change very little even if I delete the files at the mount point?

The first reason could be that the trash feature in JuiceFS is enabled. To guarantee data security, the trashis enabled by default in JuiceFS. The deleted files are placed in trash and actually not deleted, and thus the object storage usage will not change. The trash retention time can be specified by `juicefs format` and modified by `juicefs config`. Please refer to Trash for more information.

The second reason could be that JuiceFS delete the data in object storage asynchronously, which may cause a slow change in the object storage usage. If you need to clean up the deleted data in the object storage instantly, try to execute the command `juicefs gc`.

Q18: Why is the size displayed at the mount point different from the object storage usage?

According to Q16 “What is the implementation principle of random write in JuiceFS?”, it can be inferred that the object storage usage is equal to or larger than the actual size in most cases, especially after many file fragments are generated by random writing within a short period of time. These fragments will keep the space of the object storage until compactions and reclamations are triggered. But do not worry about these fragments taking up space all the time since each read and write process will check the number of fragments and trigger the compaction when necessary. Alternatively, you can manually trigger compactions and recycles by the command `juicefs gc --compact --delete`.

In addition, if the compression feature is enabled in JuiceFS (disabled by default), the size of the objects stored in the object storage may be smaller than the actual file size (depending on the compression ratio of different file types).

If the above factors can be ruled out, please check the storage class of the object storage being used. The cloud service providers may set the minimum billable size for some storage classes. For example, S3 Standard-IA and S3 One Zone-IA storage have a minimum billable object size of 128 KBis, and if file size is smaller than 128 KB, it will be calculated as 128 KB.

Q19: Does JuiceFS Gateway support advanced features such as multi-user management?

The built-in subcommand of JuiceFS `gateway` only provides basic S3 gateway functions, and does not support functions such as multi-user management. If you need to use these advanced features, please refer to this repo, which uses JuiceFS as an implementation of the MinIO gateway backend supporting the complete MinIO gateway features.

Q20: What is the difference between JuiceFS and XXX?

You will find the answer in the documentation Comparing with Others with details. In this section, you may check the difference between JuiceFS and Alluxio, CephFS, S3FS, SEQL.

Q21: Does JuiceFS support using a directory in object storage as the value of the `--bucket` option?

The feature is not supported as of the release of JuiceFS v1.0.0.

Q22: Does JuiceFS support accessing the data that already exists in the object storage?

The feature is not supported as of the release of JuiceFS v1.0.0.

Q23: Does JuiceFS support distributed caching currently?

The feature is not supported as of the release of JuiceFS v1.0.0.

Q24: Is there an SDK currently available for JuiceFS?

As of the release of JuiceFS v1.0.0, the community has two SDKs, one is the Java SDK that is highly compatible with the HDFS interface officially maintained by JuiceFS, and the other is the Python SDK maintained by the community users.

98% GPU Utilization Achieved in 1k GPU-Scale AI Training Using Distributed Cache

Table of Contents

JuiceFS 24 Q&As for beginners

Introduction

Q1: What can JuiceFS do?

Q2: How is the performance of JuiceFS?

Q3: Are there any prerequisites for running JuiceFS

Q4: Steps for using JuiceFS

Q5: Is it possible to get started with JuiceFS quickly without Redis locally and object storage?

Q6: Can I mount JuiceFS with a user other than `root`

Q7: How compatible is JuiceFS with POSIX?

Q8: What other ways does JuiceFS support to access data besides mount?

Q9: Does JuiceFS support Redis Sentinel or Cluster as the metadata engine?

Q10: How to evaluate the JuiceFS performance?

Q11: How to evaluate compatibility and performance of object storage?

Q12: What to do if there is a `Resource busy – try ‘diskutil unmount’` error when unmounting?

Q13: How to destroy a file system?

Q14: Where can I find the JuiceFS log?

Q15: Why can't I find the original files stored in JuiceFS in the object storage?

Q16: what is the basic principle of “random write” in JuiceFS?

Q17: Why does the object storage usage not change or change very little even if I delete the files at the mount point?

Q18: Why is the size displayed at the mount point different from the object storage usage?

Q19: Does JuiceFS Gateway support advanced features such as multi-user management?

Q20: What is the difference between JuiceFS and XXX?

Q21: Does JuiceFS support using a directory in object storage as the value of the `--bucket` option?

Q22: Does JuiceFS support accessing the data that already exists in the object storage?

Q23: Does JuiceFS support distributed caching currently?

Q24: Is there an SDK currently available for JuiceFS?

Related Posts

98% GPU Utilization Achieved in 1k GPU-Scale AI Training Using Distributed Cache

How a Distributed File System in Go Reduced Memory Usage by 90%

How We Achieved a 40x Performance Boost in Metadata Backup and Recovery

Is POSIX Really Unsuitable for Object Stores? A Data-Backed Answer