Data Lake Management
Data Lakehouse Platform
★ 3.9
Transactional Data Lake Catalog
★ 4.3
N/A — web applicationpip install pynessieN/A — web applicationpip install pynessiePython data engineers use Ilum to submit and manage PySpark jobs without managing Spark cluster infrastructure directly. Ilum's REST API enables Python orchestration tools like Airflow to trigger Spark jobs programmatically as pipeline steps. It is used in organisations that need a self-hosted alternative to managed services like AWS EMR or Databricks, providing a control plane for Spark workloads running on Kubernetes or bare metal.
Python data engineers configure PySpark to use Project Nessie as the Iceberg catalog — enabling table branching within Spark jobs. An engineer creates a Nessie branch, runs a PySpark transformation that modifies multiple Iceberg tables, validates the results, then merges the branch to main — providing atomic multi-table updates with full rollback capability.
Individual Tool Pages