Cloud Native Tools & Datasets for Python Data Engineering

Discover 60 tools tagged with Cloud Native for Python data engineering.

Tools (60)

DLT (Data Load Tool)

Python Data Loading Library

Python library that facilitates the loading phase in ETL processes. Designed to simplify loading data into various data stores or processing systems.

Free

4.5

Details Visit

dbt (Data Build Tool) - etl-frameworks tool for Python data engineering

Featured

Open-source transformation tool enabling data analysts and engineers to transform, test, and document data in the warehouse. Focuses on the transform part of ETL with SQL templating and Python scripting.

Freemium

4.9

Details Visit

Mage.AI

Data Pipeline Tool

Modern data pipeline tool focused on automating data preparation and feature engineering for machine learning. Streamlines the data transformation process in ETL workflows.

Freemium

4.6

Details Visit

Featured

Prefect

Modern Workflow Orchestration

Workflow management system designed for modern infrastructure, with a focus on simplicity, ease of use, and flexibility in defining and executing workflows.

Freemium

4.7

Details Visit

Featured

Dagster

Data Orchestrator for ML & Analytics

Open-source data orchestrator for machine learning, analytics, and ETL. Focuses on development, production, and observation of data assets with integrated pipeline views.

Freemium

4.7

Details Visit

Argo Workflows

Kubernetes-Native Workflow Engine

Open-source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Designed for large-scale computational tasks with powerful workflow features.

Free

4.6

Details Visit

Tortoise ORM

Async ORM for Python

Easy-to-use asyncio ORM inspired by Django. Designed for async/await syntax, making it perfect for asynchronous applications and modern Python development.

Free

4.6

Details Visit

Gino

Async SQLAlchemy ORM

Async ORM built on SQLAlchemy core for asyncio programming. Provides simple and intuitive API for asynchronous database interactions with high performance.

Free

4.4

Details Visit

Featured

Apache Kafka

Distributed Event Streaming Platform

Distributed event streaming platform capable of handling trillions of events a day. Used for building real-time streaming data pipelines and applications with high-throughput, fault-tolerance, and scalability.

Free

4.8

Details Visit

Featured

Apache Flink

Stream Processing Framework

Framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Known for high performance in streaming data processing with exactly-once semantics.

Free

4.7

Details Visit

Redpanda

Modern Streaming Platform

Streaming data platform API-compatible with Apache Kafka but designed for better performance and easier operational management. Modern streaming platform for mission-critical workloads.

Free / Paid

4.6

Details Visit

Featured

FastAPI

Modern High-Performance Framework

Modern, fast (high-performance) web framework for building APIs with Python 3.7+ based on standard type hints. Features automatic API documentation, easy to use, and blazing fast execution.

Free

4.9

Details Visit

PyDeequ

Data Quality for Big Data

Python API for Deequ, AWS library built on Apache Spark for defining and verifying data quality constraints. Useful for large-scale data processing and quality verification.

Free

4.5

Details Visit

Featured

TensorFlow

End-to-End ML Platform

End-to-end open-source platform for machine learning enabling complex computations with data flow graphs. Widely used for deep learning applications with robust production support.

Free

4.8

Details Visit

LightGBM

Light Gradient Boosting Machine

Gradient boosting framework using tree-based learning algorithms. Designed for speed and efficiency, supporting large datasets and distributed computing for various ML tasks.

Free

4.7

Details Visit

Featured

Apache Beam

Unified Batch and Stream Processing

Advanced unified programming model for defining and executing data processing workflows that can run on any execution engine. Provides portability across multiple execution environments including Apache Flink, Apache Spark, and Google Cloud Dataflow. Ideal for building flexible, scalable data pipelines.

Free

4.5

Details Visit

Featured

Boto3

AWS SDK for Python

The official Amazon Web Services (AWS) SDK for Python. Enables Python developers to write software that makes use of services like Amazon S3, EC2, Lambda, and more. Provides easy-to-use, object-oriented API as well as low-level access to AWS services, making it simple to integrate Python applications with AWS infrastructure.

Free

4.8

Details Visit

Featured

Google Cloud Client Libraries

GCP SDK for Python

Google Cloud Platform's official client library for Python, enabling seamless integration with GCP services like Compute Engine, Cloud Storage, BigQuery, and Pub/Sub. Designed for a Pythonic, intuitive experience when interacting with Google Cloud services, with idiomatic code patterns and comprehensive documentation.

Free

4.7

Details Visit

Azure SDK for Python

Microsoft Azure SDK

Microsoft's comprehensive Azure SDK for Python offering a complete set of packages to interact with Azure resources and services. Supports wide range of Azure services including Virtual Machines, Storage, Databases, AI services, and more. Provides tools for effective resource management and service interaction within Azure ecosystem.

Free

4.6

Details Visit

IBM Cloud Python SDK

IBM Cloud Services SDK

Official SDK for interacting with various IBM Cloud services programmatically. Provides comprehensive support for IBM Cloud services including CIS, DNS, IAM, VPC, Watson AI, and more. Enables management and automation of IBM Cloud resources with Python, compatible with Python 3.6 and above.

Free

4.3

Details Visit

Oracle Cloud Infrastructure SDK

OCI SDK for Python

Official SDK for writing code to manage Oracle Cloud Infrastructure resources. Supports wide range of Oracle Cloud services with functionalities for compute, storage, networking, databases, and more. Available across multiple operating systems and Python versions, providing robust interface for OCI resource management.

Free

4.4

Details Visit

Featured

Amazon S3

Scalable Object Storage

Amazon Simple Storage Service offers industry-leading scalability, data availability, security, and performance for object storage. Commonly used for data backup, archival, big data analytics, disaster recovery, and content distribution. Provides 99.999999999% durability and integrates seamlessly with AWS analytics and ML services.

Pay-as-you-go

4.8

Details Visit

Amazon EC2

Scalable Virtual Servers

Amazon Elastic Compute Cloud provides secure, resizable compute capacity in the cloud. Offers wide selection of instance types optimized for different use cases including compute-intensive, memory-intensive, and storage-optimized workloads. Perfect for running data processing jobs, ML training, and distributed applications.

Pay-as-you-go

4.7

Details Visit

Featured

Amazon Redshift

Cloud Data Warehouse

Fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to analyze all your data using standard SQL and existing BI tools. Offers fast query performance using columnar storage, data compression, and massively parallel query execution. Integrates with AWS data lake and analytics services.

Pay-as-you-go

4.6

Details Visit

Azure Blob Storage

Massively Scalable Object Storage

Microsoft's object storage solution for the cloud, optimized for storing massive amounts of unstructured data. Offers hot, cool, and archive access tiers for cost optimization. Ideal for serving images, documents, streaming video and audio, data lakes, backup and disaster recovery, and big data analytics.

Pay-as-you-go

4.6

Details Visit

Featured

Azure Data Lake Storage

Enterprise Data Lake

Scalable and secure data lake that enables high-performance analytics workloads. Built on Azure Blob Storage with hierarchical namespace capabilities. Integrates seamlessly with Azure analytics services like Synapse, Databricks, and HDInsight. Optimized for big data analytics with enterprise-grade security and compliance.

Pay-as-you-go

4.5

Details Visit

Azure Synapse Analytics

Unified Analytics Platform

Analytics service that brings together enterprise data warehousing and Big Data analytics. Provides unified experience to ingest, explore, prepare, manage, and serve data for immediate BI and machine learning needs. Supports both serverless and dedicated resource models with deep integration with Power BI and Azure ML.

Pay-as-you-go

4.5

Details Visit

Google Cloud Storage

Unified Object Storage

Unified object storage for developers and enterprises, from live applications data to cloud archival. Offers multiple storage classes including Standard, Nearline, Coldline, and Archive for cost optimization. Provides strong consistency, high durability, and seamless integration with Google Cloud data analytics and ML services.

Pay-as-you-go

4.7

Details Visit

Google Compute Engine

High-Performance Virtual Machines

Offers virtual machines running in Google's innovative data centers and worldwide fiber network. Provides predefined and custom machine types, sustained use discounts, and per-second billing. Ideal for compute-intensive workloads, batch processing, and running distributed data processing frameworks like Spark and Hadoop.

Pay-as-you-go

4.6

Details Visit

Featured

Google BigQuery

Serverless Data Warehouse

Fast, economical, and fully managed serverless data warehouse for large-scale data analytics. Enables super-fast SQL queries using the processing power of Google's infrastructure. Built-in machine learning capabilities, automatic scaling, and pay-per-query pricing. Ideal for analyzing petabytes of data with standard SQL.

Pay-as-you-go

4.8

Details Visit

Featured

dbdiagram.io

Database Design as Code

Free, simple tool to draw Entity-Relationship diagrams by just writing code. Designed to help developers design and visualize database structures in a straightforward and intuitive way. Perfect for quickly sketching database schemas and sharing them with your team through simple DSL syntax.

Free

4.6

Details Visit

Featured

Lucidchart

Collaborative Diagramming Platform

Online diagram application that makes it easy to sketch and share professional flowcharts and database diagrams. Offers comprehensive support for database design and ER diagrams with collaborative environment for teams. Real-time collaboration, extensive template library, and integrations with popular tools.

Freemium

4.7

Details Visit

Featured

MongoDB

Document NoSQL Database

Document database with scalability and flexibility, featuring querying and indexing capabilities. Stores data as JSON documents, making it ideal for rapid development and horizontal scaling. Supports aggregation pipelines, transactions, and has rich Python driver support with PyMongo.

Freemium

4.6

Details Visit

Cloudera

Enterprise Data Cloud

Enterprise data cloud offering storage, processing, and exploration capabilities for any data. Focuses on enterprise-level data management and analytics with comprehensive support for Hadoop ecosystem, machine learning, and real-time analytics. Provides hybrid and multi-cloud deployment options.

Enterprise Pricing

4.3

Details Visit

Teradata

Enterprise Data Warehouse

Established enterprise data warehousing solution offering comprehensive capabilities for data warehousing, data lakes, and analytics. Known for scalability and hybrid cloud environment support. Provides advanced analytics, workload management, and integration with popular BI tools.

Enterprise Pricing

4.2

Details Visit

Featured

Databricks

Unified Analytics Platform

Cloud data platform supporting data engineering, collaborative data science, machine learning, and analytics. Built on Apache Spark with Delta Lake for reliable data lakes. Ideal for organizations focusing on advanced analytics, ML workflows, and collaborative data science with notebooks.

Pay-as-you-go

4.7

Details Visit

Oracle Autonomous Database

Self-Managing Cloud Database

High-performance, self-managing data management service with automated patching, upgrading, and tuning. Particularly beneficial for enterprises in Oracle ecosystem or seeking highly automated data management. Features include automatic indexing, scaling, and security patching.

Pay-as-you-go

4.4

Details Visit

Featured

Snowflake

Cloud Data Platform

Cloud-native data platform supporting data warehousing, data lakes, data engineering, data science, and data sharing. Architecture separates compute and storage for independent scaling. Features include zero-copy cloning, time travel, automatic scaling, and multi-cloud support. Pay only for resources used.

Pay-as-you-go

4.8

Details Visit

ORM (encode/orm)

Lightweight Async ORM

Lightweight and async-ready ORM designed to work with FastAPI and Starlette. Particularly suited for applications requiring asynchronous database operations with minimal overhead and modern Python async/await patterns.

Free

4.3

Details Visit

Amazon RDS

Managed Relational Database Service

AWS managed relational database service that simplifies setup, operation, and scaling of databases in the cloud. Supports MySQL, PostgreSQL, MariaDB, Oracle, and SQL Server with automated backups, patching, and replication.

4.6

Details Visit

Amazon DynamoDB

Managed NoSQL Key-Value Database

A fully managed, serverless NoSQL database service from AWS designed for high-performance applications at any scale. DynamoDB provides single-digit millisecond performance with built-in security, backup, and in-memory caching.

4.6

Details Visit

Reverse ETL Platform

A reverse ETL tool that syncs data from your cloud data warehouse to SaaS applications like Salesforce, HubSpot, and Marketo. Census enables operational analytics by activating warehouse data across business tools without custom integrations.

4.3

Details Visit

Featured

Airbyte

Open-Source Data Integration Platform

An open-source data integration platform for modern data teams. Airbyte offers 300+ pre-built connectors to sync data from APIs, databases, and files into data warehouses and lakes, with support for custom connector development.

Freemium

4.6

Details Visit

Estuary Flow

Real-Time Data Pipeline Platform

A no/low-code data pipeline platform that handles both batch and real-time data ingestion. Estuary Flow uses change data capture to stream data from databases and provides a visual interface for building and monitoring data pipelines.

Freemium

4.2

Details Visit

Artie

Real-Time CDC Data Ingestion

A real-time data ingestion tool leveraging change data capture to stream database changes into data warehouses. Artie minimizes data latency by continuously syncing changes rather than running periodic batch extracts.

Freemium

4.1

Details Visit

Google Sheets ETL

Sheets to Data Warehouse Loader

An open-source tool for live importing all your Google Sheets to your data warehouse. Google Sheets ETL automates the extraction of spreadsheet data into structured tables, bridging the gap between business users and data infrastructure.

Free

3.7

Details Visit

DataKitchen

Data Observability Platform

An open-source data observability platform for end-to-end data journey observability. DataKitchen monitors data pipelines from source to consumption, detecting issues like schema changes, data freshness problems, and quality anomalies.

Freemium

4.1

Details Visit

Grai

Data Catalog for CI/CD

A data catalog tool that integrates into your CI system to prevent data quality issues before they reach production. Grai maps data lineage across your stack and automatically tests the impact of schema changes on downstream consumers.

Free

Details Visit

Cloud-Backed File System

A file system that stores all its data online using storage services like Google Storage, Amazon S3, or OpenStack. S3QL provides a standard POSIX file system interface with features like deduplication, compression, and encryption.

Free

3.8

Details Visit

Featured

lakeFS

Git-Like Data Lake Versioning

An open-source platform that delivers resilience and manageability to object-storage-based data lakes. lakeFS provides git-like branching, merging, and versioning for data, enabling safe experimentation and CI/CD workflows for data pipelines.

Freemium

4.5

Details Visit

Ilum

Data Lakehouse Platform

A modular data lakehouse platform that simplifies the management and monitoring of Apache Spark clusters. Ilum provides a unified interface for running Spark jobs, managing data pipelines, and monitoring cluster health in lakehouse architectures.

Freemium

3.9

Details Visit

FlightPath Data

Data Lake Bronze Layer Gateway

A gateway to a data lake's bronze layer that handles raw data ingestion and landing. FlightPath provides a managed entry point for data flowing into your data lake, ensuring consistent formatting and quality at the ingestion stage.

Freemium

3.7

Details Visit

Featured

Prometheus

Open-Source Monitoring System

An open-source systems monitoring and alerting toolkit with a powerful multi-dimensional data model and flexible query language (PromQL). Prometheus is the standard for monitoring cloud-native and Kubernetes-based data infrastructure.

Free

4.7

Details Visit

Featured

Grafana

Observability & Dashboarding Platform

An open-source analytics and interactive visualization platform. Grafana connects to dozens of data sources including Prometheus, InfluxDB, and Elasticsearch to create rich monitoring dashboards for data infrastructure and pipeline health.

Freemium

4.8

Details Visit

Tools (60)

DLT (Data Load Tool)

Python Data Loading Library

Python library that facilitates the loading phase in ETL processes. Designed to simplify loading data into various data stores or processing systems.

Free

4.5

Details Visit

Data Orchestrator for ML & Analytics

Open-source data orchestrator for machine learning, analytics, and ETL. Focuses on development, production, and observation of data assets with integrated pipeline views.

Freemium

4.7

Details Visit

Argo Workflows

Kubernetes-Native Workflow Engine

Open-source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Designed for large-scale computational tasks with powerful workflow features.

Free

4.6

Details Visit

Tortoise ORM

Async ORM for Python

Easy-to-use asyncio ORM inspired by Django. Designed for async/await syntax, making it perfect for asynchronous applications and modern Python development.

Free

4.6

Details Visit

Gino

Async SQLAlchemy ORM

Async ORM built on SQLAlchemy core for asyncio programming. Provides simple and intuitive API for asynchronous database interactions with high performance.

Free

4.4

Details Visit

Featured