Data Lake Management
Data Lakehouse Platform
★ 3.9
Git-Like Data Lake Versioning
★ 4.5
N/A — web applicationpip install lakefsN/A — web applicationpip install lakefsPython data engineers use Ilum to submit and manage PySpark jobs without managing Spark cluster infrastructure directly. Ilum's REST API enables Python orchestration tools like Airflow to trigger Spark jobs programmatically as pipeline steps. It is used in organisations that need a self-hosted alternative to managed services like AWS EMR or Databricks, providing a control plane for Spark workloads running on Kubernetes or bare metal.
Python data engineers use lakeFS to apply software engineering practices to data lake management. A pipeline writes to a lakeFS branch, data quality tests run against the branch, and the Python SDK merges the branch to main only on test success. This prevents bad pipeline outputs from reaching production consumers — the same guarantee that Git branches provide for code changes.
Individual Tool Pages