Data Ingestion
Hadoop-RDBMS Data Transfer
★ 3.8
Open Source Message Broker
★ 4.6
N/A — Java-based, retired projectpip install pikaN/A — Java-based, retired projectpip install pikaPython data engineers invoke Sqoop from Python subprocess calls or Oozie workflows to bulk-transfer data between relational databases and HDFS. A Python orchestration script generates the Sqoop import command with table name, where clause, and parallelism parameters, runs it, monitors the return code, and proceeds to PySpark transformation once the data lands in HDFS.
Python data engineers use `pika` or `aio-pika` to connect pipelines to RabbitMQ. A common pattern is a Python producer that publishes enriched records to a topic exchange after transformation, and multiple consumer processes that subscribe to routing key patterns for parallel downstream processing. RabbitMQ's dead-letter queues handle failed processing with configurable retry logic.
Individual Tool Pages