Discover 7 tools tagged with Data Lake for Python data engineering.
Data lake tools manage the storage, cataloguing, versioning, and governance of large repositories of raw data in cloud object stores like S3 and GCS. Python data engineers use data lake frameworks like Apache Iceberg, Delta Lake, and Apache Hudi to implement ACID transactions, schema evolution, and time-travel queries on data lake tables.
Incremental Data Processing Framework
An open-source framework for managing storage for real-time data processing on top of data lakes. Hudi provides record-level insert, update, and delete capabilities along with change streams, enabling incremental data pipelines on large-scale datasets.
Unified Metadata Management
An open-source, unified metadata management platform for data lakes, data warehouses, and external catalogs. Gravitino provides a single point of access for managing metadata across diverse data sources, simplifying governance and discovery.
Git-Like Data Lake Versioning
An open-source platform that delivers resilience and manageability to object-storage-based data lakes. lakeFS provides git-like branching, merging, and versioning for data, enabling safe experimentation and CI/CD workflows for data pipelines.
Transactional Data Lake Catalog
A transactional catalog for data lakes with git-like semantics. Nessie works with Apache Iceberg tables to provide multi-table transactions, branching, tagging, and time-travel queries across your data lake.
Data Lake Bronze Layer Gateway
A gateway to a data lake's bronze layer that handles raw data ingestion and landing. FlightPath provides a managed entry point for data flowing into your data lake, ensuring consistent formatting and quality at the ingestion stage.