A universal data ingestion framework for Hadoop from LinkedIn. Gobblin handles the complete data ingestion lifecycle including extraction, transformation, quality checks, and publishing for both batch and streaming data sources.
Python data engineers interact with Gobblin by defining configuration files that specify source, extractor, converter, and writer plugins — executed as a Hadoop or standalone Java job. Python orchestration scripts manage Gobblin execution via REST API, monitor job completion, and process ingested output files with PySpark for downstream transformation and loading.
A universal data ingestion framework for Hadoop from LinkedIn. Gobblin handles the complete data ingestion lifecycle including extraction, transformation, quality checks, and publishing for both batch and streaming data sources.
Yes, Apache Gobblin is free to use.
Apache Gobblin is listed under the Data Ingestion category on Python Data Engineering.
Details
Related
| Tool | Pricing | Rating | |
|---|---|---|---|
AS Apache Sqoop Hadoop-RDBMS Data Transfer | Free | ★ 3.8 | → |
AT Apache Tez DAG-Based Processing Framework | Free | ★ 4.0 | → |
PR Prestofeatured Distributed SQL Query Engine | Free | ★ 4.5 | → |