Configuration

Hadoop SDK is one of many ways to use JuiceFS, thus most configuration items have the same meaning as JuiceFS Client, you can learn more at Command Reference.

Core configurations

Item	Default Value	Description
`fs.jfs.impl`	`com.juicefs.JuiceFileSystem`	Specify the implementation of the storage type `jfs://`.
`fs.AbstractFileSystem.jfs.impl`	`com.juicefs.JuiceFS`	Specify the implementation of `AbstractFileSystem` for MapReduce.
`juicefs.token`		Credential for JuiceFS volume, checkout from the setting page of JuiceFS web console.
`juicefs.bucket`		Optionally provide the name or endpoint of the bucket, to overwrite the configured value in JuiceFS web console.
`juicefs.accesskey`		Access Key for object store (omit if client node can access object storage without credentials).
`juicefs.secretkey`		Secret Key for object store (omit if client node can access object storage without credentials).
`juicefs.console-url`		JuiceFS Web Console address, only needed in on-premise environments.

Data replication configurations

Read Data replication to learn more.

Item	Default Value	Description
`juicefs.bucket2`		Optionally provide the name or endpoint for the secondary bucket for Data replication, to overwrite the configured value in JuiceFS web console.
`juicefs.accesskey2`		Access Key for replicate object store (omit if client node can access object storage without credentials).
`juicefs.secretkey2`		Secret Key for replicate object store (omit if client node can access object storage without credentials).

Cache configurations

Read Cache to learn more.

Item	Default Value	Description
`juicefs.cache-dir`	`memory`	Local cache directory, default to process memory, can specify multiple directories separate by `:`, or use wildcards `*`. When using local directories, you should create them in advance and give `0777` permission so components could share cache data. This option is the same meaning as `--cache-dir`.
`juicefs.cache-size`	100	Cache capacity in MiB. Default size is small because Hadoop SDK uses memory as default cache location. This option is the same meaning as `--cache-size`.
`juicefs.cache-replica`	1	Number of nodes that a Block can be scheduled on. Hadoop applications support data locality scheduling by checking data blocks' `BlockLocation` attribute, so setting a higher replica will allow blocks to be put on more nodes, hence increasing compute task concurrency. Block size is controlled by `juicefs.block.size` configuration.
`juicefs.cache-group`		Cache group name for distributed cache. Nodes within the same group share cache, disabled by default. Recommended for applications like Spark where perfect data locality isn't available.
`juicefs.no-sharing`	false	When inside a cache group, only fetch cache data from others, but never share its own cache. Use this option on ephemeral mount points (like Kubernetes Pod).
`juicefs.cache-full-block`	true	Cache full sized data block, default to true. Disable this when you need to frequently access a same set of small files, or when disk throughput is smaller th an object storage throughput. This option is the opposite meaning as `--cache-partial-only`.
`juicefs.memory-size`	300	Maximum memory for read write buffer in MiB, same meaning as `--buffer-size`.
`juicefs.auto-create-cache-dir`	true	Whether to create cache directories automatically. When set to false, non-existent cache directories will be ignored, effectively disabling cache.
`juicefs.free-space`	0.2	Minimum free space ratio. When free space is under this ratio, it will clear the cache to free disk space, default to 20%. This option is the same meaning as `--free-space-ratio`.
`juicefs.metacache`	true	Enable metadata cache.
`juicefs.discover-nodes-url`		Specify the node discovery API, the node list will be refreshed every 10 minutes. Node list is also used as a whitelist for the cache group, only nodes in this list can join the cache group. Use this method to prevent clients outside the computing cluster from joining the cache group, hindering the distributed cache group performance (read cache group troubleshooting for more). All nodes: `all`, this mode disables auto discovery, hence data locality scheduling isn't available because there's no way to generate `BlockLocation` YARN: `yarn` Spark Standalone: `http://spark-master:web-ui-port/json/` Spark ThriftServer: `http://thrift-server:4040/api/v1/applications/` Presto: `http://coordinator:discovery-uri-port/v1/service/presto/` File system: `jfs://{VOLUME}/etc/nodes`, you need to create this file manually, and write the hostname of the node into this file line by line For Kerberos clusters, only "All nodes" and "File system" configurations are supported.
`juicefs.hflush-delay`	0	Delay hflush (in ms) operations so that data writes is consolidated, this results in fewer object storage PUT requests while increasing overall throughput. Typically used to increase HBase WAL.
`juicefs.write-group-cache`	false	Build distributed cache for newly written blocks. Same meaning as `--fill-group-cache`.
`juicefs.cache-priority` Added in v5.0.14	0	The priority of the cache block. The available values are: 0, 1, 2, and 3. The larger the number, the higher the priority. When cache is evicted, data with lower priority will be evicted first.
`juicefs.entry-cache`	0.0	File entry cache timeout in seconds.
`juicefs.dir-entry-cache`	0.0	Directory entry cache timeout in seconds.
`juicefs.attr-cache`	0.0	File attribute cache timeout in seconds.
`juicefs.block.size`	`dfs.blocksize` or 128MB	Logical block size for Hadoop SDK, controls task data sizes for applications like Spark.
`juicefs.cache-group-size`	4 * `juicefs.block.size`	JuiceFS Client performs readahead and prefetch, so for files smaller than this size, client will try to schedule all its data blocks into a single node, to maximize cache utilization.

Object storage configurations

Item	Default Value	Description
`juicefs.bucket`		Specify the bucket name of object store.
`juicefs.prefetch`	1	Prefetch N blocks in parallel, same as `--prefetch`
`juicefs.max-uploads`	50	Maximum number of concurrency for uploading object
`juicefs.upload-limit`	0	Speed limit for uploading object by a single process, units byte/s
`juicefs.max-downloads`	50	Maximum number of concurrency for downloading object
`juicefs.download-limit`	0	Speed limit for downloading object by a single process, units byte/s
`juicefs.get-timeout`	5	The max number of seconds to download an object
`juicefs.put-timeout`	60	The max number of seconds to upload an object
`juicefs.max-readahead`	0	Maximum memory size in MiB for readahead (read relevant sections in cache to learn about readahead), default to `0`, which means the actual max readahead is 20% of `juicefs.memory-size`. Set this value to a lower int (like `1`) to reduce read amplification.
`juicefs.external`	false	Using external domain to access object store

Security configurations

Item	Default Value	Description
`juicefs.server-principal`		After enabling Kerberos, you need to specify the principal of the JuiceFS metadata service, refer to "Using Kerberos"

Other configurations

Item	Default Value	Description
`juicefs.access-log`		The filepath for file system access log (e.g `/tmp/juicefs.access.log`), read and write permission is required for all Hadoop components that uses JuiceFS. Log file will rotate at 300MiB, and retain the last 7 files.
`juicefs.debug`	false	Enable DEBUG level log.
`juicefs.superuser`	`hdfs`	Specify the superuser name, to tell JuiceFS Hadoop SDK which user is superuser.
`juicefs.supergroup`	`hdfs`	Specify the supergroup name, all users within this group is considered superuser.
`juicefs.rsaPrivKeyPath`		The file path of RSA Private Key for data encryption.
`juicefs.rsaPassphrase`		The passphrase of RSA Key for data encryption.
`juicefs.file.checksum`	false	Enable checksum for copying data via Hadoop DistCp
`juicefs.grouping`		Specify the location of the group file to configure user groups and user mapping information, e.g. `jfs://myjfs/etc/group`. The file format is: `<groupname>:<username1>,<username2>`
`juicefs.conf-dir`		Specify the dir for file system config, you can find it on the mount machine under `/root/.juicefs`. Name format `{VOLUME}.conf`.

Configure multiple JuiceFS file systems

When using multiple JuiceFS volumes, all of above items can be specified for a single filesystem, the file system name VOL_NAME should to be placed in the middle of the configuration item, such as:

core-site.xml
<property>
  <name>juicefs.{VOL_NAME}.debug</name>
  <value>false</value>
</property>

Core configurations​

Data replication configurations​

Cache configurations​

Object storage configurations​

Security configurations​

Other configurations​

Configure multiple JuiceFS file systems​