What is the JuiceFS?

JuiceFS is a POSIX-compatible shared filesystem specifically designed to work in the cloud.

  • It is designed to run in the cloud so you can utilize the cheap price of object storage service to store your data economically.
  • It is a POSIX-compatible filesystem so you can access your data seamlessly as accessing local files.
  • It is a shared filesystem so you can share your files across multiple machines.

Core Features

  • POSIX-compatibility: JuiceFS works like a local filesystem. Existing applications can work with it without any changes.
  • Strong Consistency: All confirmed changes made to your data will be reflected in different machines immediately.
  • Outstanding Performance: The latency can be as low as a few microseconds, and the throughput can be expanded to nearly unlimited. (By increasing the number of clients).
  • High Availability: The implementation of Raft promised the availability of the system.
  • Scalable Architecture: JuiceFS is designed to serve petabytes-level data and billions of files while keeping the maintenance cost low.
  • Replication Across Cloud and Regions: Your data can be replicated to different regions even clouds automatically similar to an RAID-1.
  • Data Security: All your data will be stored in your account away from us. There is no way for us to access your data. Also, all connections are fully encrypted.
  • Economy: Your data will be stored in the object storage service you chose, and it will be stored in a compressed format that is design for storing in such services. The cost of JuiceFS can be as low as one tenth of the cost of NFS.

JuiceFS is designed to solve the problem of big data storage. It is suitable for data storage, computation, data sharing and backup in areas like big data analytics, machine learning, data mining, and bioinformatics. It’s also a scalable replacement for NAS.

The POSIX-compatibility provides a seamless migrating experience. You can easily replace your existing solutions (such as: local disk, NFS, or HDFS) with JuiceFS with zero cost, and you no longer need to maintain the system by yourselves. Your team can focus exclusively on product development with all the hassles of managing a large scale storage system gone.

Architecture

There are two major parts in JuiceFS:

  • JuiceFS Metadata Service managers the metadata information of your filesystem. We implemented Raft to provide high availability and strong consistency. It is also carefully designed and optimized to provide excellent performance and stability.
  • JuiceFS Client is the program that runs in your server. It coordinates between the metadata service and the object storage service, and it provides the filesystem interface in your system.

Additionally, we provide a script of some common operations you might need for managing files in JuiceFS.

_images/architecture-en.png

Note: Metadata includes filename, file size, permission group, timestamp of file creation, timestamp of update and directory hierarchy.

The diagram below illustrates how JuiceFS works when it is mounted to your machines.

_images/vm-architecture-en.png

When your application fires any IO system call (open, read, write etc.) to access your data. The system call will be forwarded by the kernel module FUSE and VFS to the JuiceFS client, and the client will fulfill the request by communicating with the metadata service and object storage service.

Use Cases

JuiceFS is designed for massive data storage. It can be a replacement of many distributed filesystem and network filesystem, especially in the following scenarios:

  • Big Data Analytics: JuiceFS has the same interface as accessing local files, so your application will not be bounded to external APIs. It can also work seamlessly with popular distributed computation frameworks such as Apache Spark, Hadoop, Hive, etc. It has unlimited expandable storage space, and you do not need to maintain the service by yourself. The ability to provide high concurrencies and high throughput will let JuiceFS meet the performance need of data analytics.
  • Shared Workspace: JuiceFS does not have any VPC limitation so you can mount it to any machines. There is no limitation on concurrent read and write. POSIX API is compatible with all your existing data stream and scripts.
  • Shared Volume Storage Between Container Clusters: JuiceFS perfectly satisfies the need of persistent storage of container volumes, and it is independent from the life cycle of containers. The strong consistency assures the correctness of your data. JuiceFS will also make it easier for you to create stateless services.
  • Backup: POSIX is the most friendly and the most familiar interface for engineers. It is as easy as managing files on local disks. The storage space can be expanded seamlessly to any size you need. Replication across regions and clouds helps you build your global infrastructure. The flexible requirements allows JuiceFS can be blended into your existing architecture without compromises. Snapshot can be used to recover and validate your data.

For more detail please visit Use Cases.

Privacy

Your files stored in JuiceFS are seralized in a special designed format kept in your object storage account. You have the full control over the visibility of your data. The only information will go though our server is the metadata information of your files, which only includes the name of your files, the directory hierarchies and other basic information.