File Systems & Storage
Hadoop Distributed File System
★ 4.4
Cloud-Backed File System
★ 3.8
pip install hdfspip install s3qlpip install hdfspip install s3qlPython data engineers interact with HDFS using `pyarrow.fs.HadoopFileSystem` or the `hdfs` Python client. PySpark accesses HDFS transparently via `spark.read.parquet('hdfs:///path/')` — the cluster configuration points Spark to the NameNode. Python scripts that manage file operations (listing, deleting, moving files) use the `subprocess` module to call `hdfs dfs` commands or the WebHDFS REST API.
Python data engineers use S3QL to mount cloud object storage as an encrypted local file system — writing pipeline output files to a mounted S3QL volume using standard Python file I/O (`open()`, `write()`) without any cloud SDK code. S3QL's encryption-at-rest is useful for storing sensitive pipeline outputs in cloud storage with a stronger encryption posture than default S3 SSE.
Individual Tool Pages