Back to Tools Back to Datasets

File System Tools & Datasets for Python Data Engineering

Discover 8 tools tagged with File System for Python data engineering.

Distributed and cloud file systems provide the storage layer for large-scale data engineering pipelines. Tools tagged file-system include HDFS, Amazon S3, Google Cloud Storage, and Azure Data Lake Storage, accessed from Python using fsspec, s3fs, and cloud SDK clients. These systems store raw data, processed datasets, and pipeline checkpoints.

Tools (8)

HDFS - file-systems-storage tool for Python data engineering

Featured

HDFS

Hadoop Distributed File System

A distributed file system designed to run on commodity hardware as part of the Apache Hadoop ecosystem. HDFS provides high-throughput access to application data and is the foundation for storing massive datasets in Hadoop-based data platforms.

Free

◆4.4

Alluxio - file-systems-storage tool for Python data engineering

Alluxio

Memory-Centric Storage System

A memory-centric distributed storage system that acts as a caching layer between compute frameworks and storage systems. Alluxio accelerates data access by serving hot data from memory, bridging the gap between compute and storage.

Freemium

◆4.2

CEPH - file-systems-storage tool for Python data engineering

CEPH

Unified Distributed Storage

A unified, distributed storage system providing object, block, and file storage in a single platform. CEPH is designed for excellent performance, reliability, and scalability, widely used in cloud infrastructure and data center environments.

Free

◆4.4

JuiceFS - file-systems-storage tool for Python data engineering

JuiceFS

Cloud-Native File System

A high-performance, cloud-native file system driven by object storage. JuiceFS provides a POSIX-compatible interface backed by cloud storage like S3, making it easy to mount cloud storage as a local file system for data processing workloads.

Freemium

◆4.3

GlusterFS - file-systems-storage tool for Python data engineering

GlusterFS

Scalable Network File System

A scalable, distributed network file system suitable for data-intensive tasks such as cloud storage and media streaming. GlusterFS aggregates disk storage from multiple servers into a single global namespace for large-scale data access.

Free

◆4

SeaweedFS - file-systems-storage tool for Python data engineering

SeaweedFS

Simple Distributed File System

A simple and highly scalable distributed file system designed for fast, efficient storage and retrieval of billions of files. SeaweedFS supports S3 API compatibility, erasure coding, and FUSE mounting for flexible data access.

Free

◆4.2

S3QL - file-systems-storage tool for Python data engineering

S3QL

Cloud-Backed File System

A file system that stores all its data online using storage services like Google Storage, Amazon S3, or OpenStack. S3QL provides a standard POSIX file system interface with features like deduplication, compression, and encryption.

Free

◆3.8

LizardFS - file-systems-storage tool for Python data engineering

LizardFS

Fault-Tolerant Distributed File System

A software-defined storage solution that is distributed, parallel, scalable, fault-tolerant, and geo-redundant. LizardFS provides a highly available file system with automatic data replication and self-healing capabilities.

Free

◆3.7