Data Ingestion
Distributed Pub-Sub Messaging
★ 4.5
Universal Data Ingestion Framework
★ 3.9
pip install pulsar-clientN/A — Java-basedpip install pulsar-clientN/A — Java-basedPython data engineers use the `pulsar-client` Python SDK to produce and consume messages from Pulsar topics. Pulsar Functions can be written in Python to perform lightweight transformations — filtering, enriching, or routing messages — without deploying a separate Faust or Spark Streaming cluster. Pulsar's topic compaction and retention policies simplify stateful event stream management.
Python data engineers interact with Gobblin by defining configuration files that specify source, extractor, converter, and writer plugins — executed as a Hadoop or standalone Java job. Python orchestration scripts manage Gobblin execution via REST API, monitor job completion, and process ingested output files with PySpark for downstream transformation and loading.
Individual Tool Pages