File Systems & Storage
Unified Distributed Storage
★ 4.4
Cloud-Native File System
★ 4.3
pip install cephN/A — CLI binary, see juicefs.compip install cephN/A — CLI binary, see juicefs.comPython data engineers in on-premise or private cloud environments use Ceph's S3-compatible RADOS Gateway as a drop-in replacement for AWS S3 — boto3 and awswrangler work unchanged by pointing them at the Ceph endpoint URL. CephFS is mounted as a shared file system that multiple Python pipeline worker nodes read from and write to simultaneously.
Python data engineers use JuiceFS to mount cloud object storage as a local POSIX file system — enabling Python pipeline code that reads and writes local files to work seamlessly with S3 or GCS as the backing store without using boto3 or cloud-specific SDKs. PySpark jobs on JuiceFS benefit from its Hadoop-compatible interface and local cache for repeated dataset reads.
Individual Tool Pages