ETL Frameworks Projects

Extract, Transform, Load frameworks for data pipelines.

3 projects available

How to Choose the Right ETL Framework for Python?

When considering ETL frameworks, here's how to decide: Opt for Pandas when working with medium-sized datasets that fit into memory and when you need to perform complex data manipulations efficiently. Select Apache Spark (via PySpark) when dealing with large datasets that don't fit into memory, requiring distributed processing across a cluster. DLT (Data Load Tool) should be your go-to when the primary focus is on the loading phase of ETL, optimizing data loading into various data stores. Choose dbt (Data Build Tool) when you need to focus on the transformation aspect within your data warehouse, particularly powerful for managing data transformations, testing, and documentation.

E-commerce Data Processing with PySpark

intermediate

Build a complete ETL pipeline using PySpark to process e-commerce data, including sales analysis, customer segmentation, and product performance metrics. Learn how to leverage Spark's distributed processing capabilities for large-scale data transformations.

pyspark

Featured

Weather Data Pipeline with DLT

beginner

Learn how to use Data Load Tool (dlt) to extract weather data from a REST API and load it into DuckDB. This beginner-friendly project demonstrates a simple yet effective data loading pattern perfect for API integration workflows.

dlt

E-commerce Data Transformation with dbt

intermediate

Build a complete dbt project with staging models, core business logic, dashboard models, and tests to transform e-commerce data in a PostgreSQL warehouse. Master the modern data transformation workflow used by data teams worldwide.

dbt

Featured