File Systems & Storage
Scalable Network File System
★ 4.0
Hadoop Distributed File System
★ 4.4
N/A — system package, install via package managerpip install hdfsN/A — system package, install via package managerpip install hdfsPython data engineers in HPC and on-premise environments use GlusterFS as a shared storage layer accessible by multiple pipeline worker nodes simultaneously. Python jobs write output files to a GlusterFS mount point, and other nodes in the cluster can immediately read those files without data movement — simplifying distributed batch processing without object storage dependencies.
Python data engineers interact with HDFS using `pyarrow.fs.HadoopFileSystem` or the `hdfs` Python client. PySpark accesses HDFS transparently via `spark.read.parquet('hdfs:///path/')` — the cluster configuration points Spark to the NameNode. Python scripts that manage file operations (listing, deleting, moving files) use the `subprocess` module to call `hdfs dfs` commands or the WebHDFS REST API.
Individual Tool Pages