File Systems & Storage
Memory-Centric Storage System
★ 4.2
Fault-Tolerant Distributed File System
★ 3.7
pip install alluxioN/A — system package, install via package managerpip install alluxioN/A — system package, install via package managerPython data engineers use Alluxio to accelerate PySpark pipelines that repeatedly read the same S3 or HDFS data. By mounting S3 data into Alluxio's memory cache, subsequent Spark reads hit in-memory cache instead of object storage — reducing read latency from seconds to milliseconds for iterative ML training or repeated dashboard queries.
Python data engineers in on-premise environments use LizardFS as a shared POSIX file system mounted across pipeline worker nodes. Python scripts write output files to the LizardFS mount and those files are immediately visible to all other nodes in the cluster — enabling simple shared-nothing pipeline patterns where workers write outputs that other workers consume without message queue coordination.
Individual Tool Pages