// 29 categories
The full taxonomy of 248+ tools across 29 categories, from ingestion to serving.
// common questions
Categories organize tools by purpose — making it easy to find exactly what you need for your data engineering project. Instead of browsing hundreds of tools randomly, categories let you focus on the specific type of tool you need.
Whether you're building ETL pipelines, setting up data warehouses, or implementing workflow orchestration, each category contains specialized tools designed for that specific use case.
Start with your goal — the category you need depends on what you're trying to accomplish.
Start with ETL Frameworks, then add Orchestration to schedule and coordinate your runs.
Explore Databases & Data Warehouses for analytics workloads.
Begin with Getting Started for setup guides and foundational tools.
💡 Most projects use tools from multiple categories. Start with your immediate need, then expand.
Both are essential, but they do different jobs:
Actually process and transform data — reading, cleaning, joining, writing. Examples: Pandas, PySpark, dbt.
Schedule, coordinate, and monitor your ETL jobs — deciding when and in what order tasks run. Examples: Airflow, Prefect, Dagster.
ETL frameworks do the data work. Orchestrators manage when and how that work runs. You typically need both in production.
The most-used categories represent the core building blocks of production data stacks: