Discover 7 tools tagged with Data Lake for Python data engineering.
Incremental Data Processing Framework
An open-source framework for managing storage for real-time data processing on top of data lakes. Hudi provides record-level insert, update, and delete capabilities along with change streams, enabling incremental data pipelines on large-scale datasets.
Unified Metadata Management
An open-source, unified metadata management platform for data lakes, data warehouses, and external catalogs. Gravitino provides a single point of access for managing metadata across diverse data sources, simplifying governance and discovery.
Git-Like Data Lake Versioning
An open-source platform that delivers resilience and manageability to object-storage-based data lakes. lakeFS provides git-like branching, merging, and versioning for data, enabling safe experimentation and CI/CD workflows for data pipelines.
Transactional Data Lake Catalog
A transactional catalog for data lakes with git-like semantics. Nessie works with Apache Iceberg tables to provide multi-table transactions, branching, tagging, and time-travel queries across your data lake.
Data Lake Bronze Layer Gateway
A gateway to a data lake's bronze layer that handles raw data ingestion and landing. FlightPath provides a managed entry point for data flowing into your data lake, ensuring consistent formatting and quality at the ingestion stage.