// data-lake-management

Project Nessie

Transactional Data Lake Catalog

About Project Nessie

A transactional catalog for data lakes with git-like semantics. Nessie works with Apache Iceberg tables to provide multi-table transactions, branching, tagging, and time-travel queries across your data lake.

Key Features

1Git-inspired catalog for Iceberg and Delta Lake tables on object storage
2Branching and tagging enables safe multi-table schema experiments
3ACID commits group changes to multiple Iceberg tables into one atomic transaction
4SQL catalog integration: Spark, Flink, and Dremio read Nessie as a catalog
5REST API and Python client for programmatic catalog management

How Python Data Engineers Use Project Nessie

Python data engineers configure PySpark to use Project Nessie as the Iceberg catalog — enabling table branching within Spark jobs. An engineer creates a Nessie branch, runs a PySpark transformation that modifies multiple Iceberg tables, validates the results, then merges the branch to main — providing atomic multi-table updates with full rollback capability.

Frequently Asked Questions

What is Project Nessie used for?▾

Is Project Nessie free to use?▾

Yes, Project Nessie is free to use.

What category does Project Nessie belong to?▾

Project Nessie is listed under the Data Lake Management category on Python Data Engineering.

Verified Listing

Visit Website

// contains affiliate links

Details

Similar Data Lake Management Tools

3 tools

Tool	Pricing	Rating
AG Apache Gravitinonew Unified Metadata Management	Free	★ 4.0	→
AH Apache HBase Distributed Column-Family Store	Free	★ 4.2	→
TI Titan Scalable Graph Database	Free	★ 3.6	→