Data Ingestion
Distributed Pub-Sub Messaging
★ 4.5
Database to Data Lake ETL
★ 3.5
pip install pulsar-clientpip install db2lakepip install pulsar-clientpip install db2lakePython data engineers use the `pulsar-client` Python SDK to produce and consume messages from Pulsar topics. Pulsar Functions can be written in Python to perform lightweight transformations — filtering, enriching, or routing messages — without deploying a separate Faust or Spark Streaming cluster. Pulsar's topic compaction and retention policies simplify stateful event stream management.
Python data engineers use db2lake to bootstrap data lake migration projects — extracting historical data from relational databases and writing it as partitioned Parquet files to S3 or HDFS. Once the initial migration is done, incremental extractions keep the lake in sync, and Python-based PySpark or DuckDB pipelines take over for ongoing processing.
Individual Tool Pages