A tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. Sqoop uses MapReduce for parallel data transfer with support for incremental imports and direct connector APIs.
Python data engineers invoke Sqoop from Python subprocess calls or Oozie workflows to bulk-transfer data between relational databases and HDFS. A Python orchestration script generates the Sqoop import command with table name, where clause, and parallelism parameters, runs it, monitors the return code, and proceeds to PySpark transformation once the data lands in HDFS.
A tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. Sqoop uses MapReduce for parallel data transfer with support for incremental imports and direct connector APIs.
Yes, Apache Sqoop is free to use.
Apache Sqoop is listed under the Data Ingestion category on Python Data Engineering.
Details
Related
| Tool | Pricing | Rating | |
|---|---|---|---|
AG Apache Gobblin Universal Data Ingestion Framework | Free | ★ 3.9 | → |
AT Apache Tez DAG-Based Processing Framework | Free | ★ 4.0 | → |
PR Prestofeatured Distributed SQL Query Engine | Free | ★ 4.5 | → |