Data Ingestion
Hadoop-RDBMS Data Transfer
★ 3.8
Open-Source Change Data Capture Platform
★ 4.7
N/A — Java-based, retired projectN/A — Java-based Kafka connectorN/A — Java-based, retired projectN/A — Java-based Kafka connectorPython data engineers invoke Sqoop from Python subprocess calls or Oozie workflows to bulk-transfer data between relational databases and HDFS. A Python orchestration script generates the Sqoop import command with table name, where clause, and parallelism parameters, runs it, monitors the return code, and proceeds to PySpark transformation once the data lands in HDFS.
Python data engineers typically run Debezium as the CDC producer and write Python consumers of the change streams it generates. After deploying Debezium connectors via Docker Compose or Kubernetes, Python services consume CDC events from Kafka topics using confluent-kafka or kafka-python — receiving full before/after row images for every database change, which are then written as Parquet to S3 or applied as upserts to a data warehouse. For teams without Kafka, Debezium Server sinks directly to AWS Kinesis or Redis Streams, both of which have first-class Python client libraries (boto3, redis-py), keeping the Python integration straightforward.
Individual Tool Pages