Data Ingestion
Managed Real-Time Streaming
★ 4.4
Polyglot Document Intelligence
★ 3.8
pip install boto3pip install kreuzbergpip install boto3pip install kreuzbergPython data engineers use `boto3`'s Kinesis client to put records onto a Data Stream from Lambda functions or EC2-based producers. Consumer applications use the Kinesis Client Library (KCL) with Python bindings, or the `amazon-kinesis-client` Python wrapper, to process shards in parallel with automatic checkpointing — a common pattern for real-time log processing and event enrichment.
Python data engineers use Kreuzberg to build document ingestion pipelines that extract text from uploaded PDFs, scanned images, and Office files. The async API integrates cleanly into FastAPI-based document processing services — an endpoint accepts a file upload, Kreuzberg extracts the text asynchronously, and the pipeline stores the result in a search index or warehouse for downstream analysis.
Individual Tool Pages