Data Ingestion
Distributed Pub-Sub Messaging
★ 4.5
Hadoop-RDBMS Data Transfer
★ 3.8
pip install pulsar-clientN/A — Java-based, retired projectpip install pulsar-clientN/A — Java-based, retired projectPython data engineers use the `pulsar-client` Python SDK to produce and consume messages from Pulsar topics. Pulsar Functions can be written in Python to perform lightweight transformations — filtering, enriching, or routing messages — without deploying a separate Faust or Spark Streaming cluster. Pulsar's topic compaction and retention policies simplify stateful event stream management.
Python data engineers invoke Sqoop from Python subprocess calls or Oozie workflows to bulk-transfer data between relational databases and HDFS. A Python orchestration script generates the Sqoop import command with table name, where clause, and parallelism parameters, runs it, monitors the return code, and proceeds to PySpark transformation once the data lands in HDFS.
Individual Tool Pages