Data Ingestion
Distributed Pub-Sub Messaging
★ 4.5
AWS Data Utility Belt for Python
★ 4.3
pip install pulsar-clientpip install awswranglerpip install pulsar-clientpip install awswranglerPython data engineers use the `pulsar-client` Python SDK to produce and consume messages from Pulsar topics. Pulsar Functions can be written in Python to perform lightweight transformations — filtering, enriching, or routing messages — without deploying a separate Faust or Spark Streaming cluster. Pulsar's topic compaction and retention policies simplify stateful event stream management.
AWS Data Wrangler (now called `awswrangler`) is the standard tool for AWS-native Python data pipelines. Engineers replace `boto3` + `pandas` boilerplate with single calls: `wr.s3.read_parquet('s3://bucket/prefix/')` reads all files into a DataFrame, and `wr.s3.to_parquet(df, 's3://bucket/output/', dataset=True)` writes back with Glue catalog registration and partitioning.
Individual Tool Pages