Data Ingestion
Hadoop-RDBMS Data Transfer
★ 3.8
Managed Real-Time Streaming
★ 4.4
N/A — Java-based, retired projectpip install boto3N/A — Java-based, retired projectpip install boto3Python data engineers invoke Sqoop from Python subprocess calls or Oozie workflows to bulk-transfer data between relational databases and HDFS. A Python orchestration script generates the Sqoop import command with table name, where clause, and parallelism parameters, runs it, monitors the return code, and proceeds to PySpark transformation once the data lands in HDFS.
Python data engineers use `boto3`'s Kinesis client to put records onto a Data Stream from Lambda functions or EC2-based producers. Consumer applications use the Kinesis Client Library (KCL) with Python bindings, or the `amazon-kinesis-client` Python wrapper, to process shards in parallel with automatic checkpointing — a common pattern for real-time log processing and event enrichment.
Individual Tool Pages