Orchestration Tools
Workflow Orchestration Platform
★ 4.8
Python Data Pipeline Framework
★ 4.4
pip install apache-airflowpip install kedropip install apache-airflowpip install kedroPython data engineers define pipelines as Directed Acyclic Graphs (DAGs) using Airflow's Python SDK. DAGs are written as .py files that instantiate Operators — PythonOperator for custom logic, BashOperator for shell commands, and provider-specific operators for Postgres, S3, BigQuery, and Snowflake. Airflow is the industry-standard orchestrator for scheduling ETL jobs, managing dependencies between tasks, and handling retries in production data pipelines.
Data engineering teams use Kedro to structure ML and analytics pipelines as modular, testable Python functions. The DataCatalog allows engineers to define data sources (S3 Parquet files, SQL tables, local CSVs) in a YAML config — switching environments just changes the catalog config, not the pipeline code. Nodes are pure Python functions that are easy to unit test.
Individual Tool Pages