Data Ingestion
Managed Real-Time Streaming
★ 4.4
Universal Data Ingestion Framework
★ 3.9
pip install boto3N/A — Java-basedpip install boto3N/A — Java-basedPython data engineers use `boto3`'s Kinesis client to put records onto a Data Stream from Lambda functions or EC2-based producers. Consumer applications use the Kinesis Client Library (KCL) with Python bindings, or the `amazon-kinesis-client` Python wrapper, to process shards in parallel with automatic checkpointing — a common pattern for real-time log processing and event enrichment.
Python data engineers interact with Gobblin by defining configuration files that specify source, extractor, converter, and writer plugins — executed as a Hadoop or standalone Java job. Python orchestration scripts manage Gobblin execution via REST API, monitor job completion, and process ingested output files with PySpark for downstream transformation and loading.
Individual Tool Pages