Python Tools & Datasets for Python Data Engineering

Discover 75 tools tagged with Python for Python data engineering.

Tools (75)

Featured

Pandas

Data Manipulation & Analysis Library

Powerful Python library for data manipulation and analysis, offering DataFrame structures for efficient data cleaning, transformation, and analysis. Often used in the transform phase of ETL processes.

Free

4.9

Details Visit

Petl

Python ETL Package

Python package specifically designed for ETL tasks, offering tools for data extraction, transformation, and loading. Suitable for simpler, script-based ETL processes.

Free

4.3

Details Visit

Featured

PySpark

Python API for Apache Spark

Python API for Apache Spark, enabling scalable and efficient data processing. Particularly useful for ETL processes involving large datasets that need parallel processing across a cluster.

Free

4.8

Details Visit

DLT (Data Load Tool)

Python Data Loading Library

Python library that facilitates the loading phase in ETL processes. Designed to simplify loading data into various data stores or processing systems.

Free

4.5

Details Visit

Bonobo

Lightweight ETL Framework

Lightweight Extract-Transform-Load (ETL) framework for Python 3.6+. Allows writing ETL scripts in pure Python, particularly suited for simple and straightforward ETL tasks.

Free

4.2

Details Visit

Mage.AI

Data Pipeline Tool

Modern data pipeline tool focused on automating data preparation and feature engineering for machine learning. Streamlines the data transformation process in ETL workflows.

Freemium

4.6

Details Visit

Featured

Apache Airflow

Workflow Orchestration Platform

Platform to programmatically author, schedule, and monitor workflows. Allows for complex pipeline construction and efficient task management with robust dependency handling.

Free

4.8

Details Visit

Luigi

Batch Job Pipeline Builder

Developed by Spotify, Luigi helps build complex pipelines of batch jobs, handling dependency resolution, workflow management, and task visualization.

Free

4.4

Details Visit

Featured

Prefect

Modern Workflow Orchestration

Workflow management system designed for modern infrastructure, with a focus on simplicity, ease of use, and flexibility in defining and executing workflows.

Freemium

4.7

Details Visit

Featured

Dagster

Data Orchestrator for ML & Analytics

Open-source data orchestrator for machine learning, analytics, and ETL. Focuses on development, production, and observation of data assets with integrated pipeline views.

Freemium

4.7

Details Visit

Dask

Parallel Computing Library

Parallel computing library that scales Pandas workflows to larger-than-memory datasets. Enables parallel processing while maintaining a familiar Pandas-like interface for big data.

Free

4.6

Details Visit

Featured

NumPy

Numerical Computing Library

Fundamental library for numerical computing in Python. Supports large multi-dimensional arrays and matrices with a vast collection of mathematical functions for array operations.

Free

4.9

Details Visit

Beautiful Soup

Web Scraping & HTML Parsing

Library for web scraping and parsing HTML/XML documents. Extensively used in data wrangling to clean, parse, and extract data from web sources.

Free

4.5

Details Visit

Scrapy

Web Crawling Framework

Powerful web crawling and scraping framework for extracting, cleaning, and processing large volumes of web data. Essential for data wrangling from web sources.

Free

4.6

Details Visit

TextBlob

Text Processing Library

Simple library for processing textual data with APIs for common NLP tasks. Essential for data wrangling when dealing with text data and natural language processing.

Free

4.3

Details Visit

Featured

Pydantic

Data Validation using Type Hints

Data validation and settings management library using Python type annotations. Ensures data conforms to defined schemas with Python's typing module, perfect for FastAPI and modern Python apps.

Free

4.9

Details Visit

Featured

Marshmallow

Object Serialization & Validation

ORM/ODM/framework-agnostic library for object serialization and deserialization. Converts complex data types to and from native Python datatypes with robust validation.

Free

4.7

Details Visit

Cerberus

Lightweight Data Validation

Lightweight and extensible data validation library supporting complex data structures with customizable validation rules. Highly flexible for various validation needs.

Free

4.5

Details Visit

Voluptuous

Python Data Structure Validation

Validates Python data structures with straightforward syntax and clear error messages. Ensures structure and content adhere to specified schemas.

Free

4.3

Details Visit

jsonschema

JSON Schema Validator

Library for validating JSON data against JSON Schema standards. Essential when working with JSON data formats to ensure schema compliance.

Free

4.6

Details Visit

Featured

Pandera

DataFrame Validation

Flexible API for data validation on dataframe structures. Validates dataframes in real-time, integrates with pydantic and fastapi. Essential for production data pipelines.

Free

4.7

Details Visit

Validr

Fast Validation Library

Fast, simple, and powerful validation library with declarative validation rules. Optimized for performance when validating data from various sources.

Free

4.2

Details Visit

Featured

SQLAlchemy

Python SQL Toolkit & ORM

Widely used ORM library providing a full suite of enterprise-level persistence patterns. Designed for efficient, high-performing database access with flexible SQL abstraction.

Free

4.9

Details Visit

Featured

Django ORM

Django's Built-in ORM

Part of Django web framework, allows defining data models entirely in Python. Provides powerful abstraction layer to translate Python code to SQL seamlessly.

Free

4.8

Details Visit

Peewee

Small Expressive ORM

Small, expressive ORM with simple and intuitive interface. Lightweight and easy to use, perfect for small to medium-sized applications prioritizing simplicity.

Free

4.6

Details Visit

Pony ORM

Pythonic Query Language

Unique ORM using generator expressions for queries. Intuitive and user-friendly, allowing complex queries in pure Python that mirror human language.

Free

4.5

Details Visit

SQLObject

Object Interface to Database

Popular ORM providing object-oriented interface with tables as classes and rows as instances. Supports variety of database backends with simplicity focus.

Free

4.2

Details Visit

Tortoise ORM

Async ORM for Python

Easy-to-use asyncio ORM inspired by Django. Designed for async/await syntax, making it perfect for asynchronous applications and modern Python development.

Free

4.6

Details Visit

Gino

Async SQLAlchemy ORM

Async ORM built on SQLAlchemy core for asyncio programming. Provides simple and intuitive API for asynchronous database interactions with high performance.

Free

4.4

Details Visit

Featured

Alembic

Database Migrations for SQLAlchemy

Lightweight database migration tool for use with SQLAlchemy. Alembic allows you to create, manage, and invoke change management scripts for your database, facilitating schema migrations as your application evolves.

Free

4.7

Details Visit

Featured

Django Migrations

Built-in Django Migration Framework

Django's powerful built-in migration framework that comes bundled with Django. Allows you to change your database schema without losing data using a simple and intuitive API.

Free

4.8

Details Visit

Flask-Migrate

Database Migrations for Flask

Extension that handles SQLAlchemy database migrations for Flask applications using Alembic. Provides command-line tools to manage and automate database migrations in Flask projects.

Free

4.5

Details Visit

yoyo-migrations

Database Schema Migration Tool

Database schema migration tool that lets you manage your database schema by applying and rolling back migration scripts written in pure SQL or Python. Simple and flexible approach to database migrations.

Free

4.3

Details Visit

SQLAlchemy-Migrate

Schema Versioning for SQLAlchemy

Provides a way to deal with database schema changes in SQLAlchemy projects. Extends SQLAlchemy to have database schema versioning and migration capabilities for managing database evolution.

Free

4.2

Details Visit

South

Legacy Django Migrations

The original migration tool for Django before built-in migrations were added in Django 1.7. Still relevant for maintaining or upgrading legacy Django applications running older versions.

Free

Details Visit

Faust

Python Stream Processing

Stream processing library porting ideas from Kafka Streams to Python. Used for building high-performance and reliable real-time stream processing applications with Pythonic API.

Free

4.5

Details Visit

Featured

Flask

Lightweight Web Framework

Lightweight WSGI web application framework easy to get started with and versatile for complex applications. Popular for building web APIs thanks to simplicity and extensibility.

Free

4.8

Details Visit

Featured

Django REST Framework

Powerful API Toolkit for Django

Powerful and flexible toolkit for building Web APIs in Django. Highly recommended for adding API capabilities to Django applications with comprehensive features and excellent documentation.

Free

4.9

Details Visit

Featured

FastAPI

Modern High-Performance Framework

Modern, fast (high-performance) web framework for building APIs with Python 3.7+ based on standard type hints. Features automatic API documentation, easy to use, and blazing fast execution.

Free

4.9

Details Visit

Tornado

Asynchronous Networking Library

Python web framework and asynchronous networking library. Particularly useful for long-polling, WebSockets, and applications requiring long-lived connections to each user.

Free

4.5

Details Visit

Falcon

High-Performance Python Framework

Reliable, high-performance Python framework for building large-scale app backends and microservices. Encourages REST architectural style while remaining highly effective and minimalist.

Free

4.6

Details Visit

Featured

Great Expectations

Data Validation & Documentation

Comprehensive tool helping data teams validate, document, and profile their data. Define expectations for your data ensuring it meets quality standards before processing.

Free / Paid

4.7

Details Visit

Ydata Profiling

Automated Data Profiling

Generates profile reports from pandas DataFrames. Excellent tool for quickly understanding data with interactive HTML reports including statistics, distributions, and correlations.

Free

4.6

Details Visit

PyDeequ

Data Quality for Big Data

Python API for Deequ, AWS library built on Apache Spark for defining and verifying data quality constraints. Useful for large-scale data processing and quality verification.

Free

4.5

Details Visit

Dedupe

ML-Powered Deduplication

Python library using machine learning to perform deduplication and entity resolution on structured data. Particularly useful for identifying and merging duplicate records.

Free

4.4

Details Visit

Soda Core

Data Quality Testing

Open-source data quality tool with CLI for defining, running, and monitoring data quality checks. Write tests to verify data meets conditions like missing values, ranges, or uniqueness.

Free / Paid

4.6

Details Visit

DataCleaner

Automated Data Cleaning

Automatic tool for cleaning and preprocessing data. Handles missing values, encodes categorical data, and scales features making data preparation efficient.

Free

4.2

Details Visit

Data Linter

Schema Validation Tool

Python package for automated data validation within Data Engineering pipelines. Engineered to ingest and validate tabular data against predefined schemas.

Free

4.1

Details Visit

Featured

Matplotlib

Comprehensive Visualization Library

Comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib is versatile and widely used for plotting graphs and charts with extensive customization options.

Free

4.8

Details Visit

Featured

Seaborn

Statistical Data Visualization

Built on top of Matplotlib, Seaborn provides a high-level interface for drawing attractive and informative statistical graphics, simplifying the creation of complex visualizations with beautiful default themes.

Free

4.7

Details Visit

Featured

Plotly

Interactive Visualization Library

Plotly offers a range of interactive plotting options and is known for its advanced graphics and interactivity, supporting complex visualizations with ease. Perfect for creating web-based dashboards.

Free / Paid

4.8

Details Visit

Bokeh

Interactive Web Visualizations

Bokeh focuses on building interactive, web-ready plots, which can be a powerful tool for creating dynamic visualizations that can easily be embedded in web applications.

Free

4.6

Details Visit

Altair

Declarative Visualization

Altair is a declarative statistical visualization library for Python, offering a simple and concise way to create a wide range of statistical plots based on a logical data mapping.

Free

4.5

Details Visit

Featured

Scikit-learn

Machine Learning in Python

Versatile library providing a range of supervised and unsupervised learning algorithms. Known for its ease of use and efficiency for data mining and data analysis with classical ML algorithms.

Free

4.9

Details Visit

Featured

TensorFlow

End-to-End ML Platform

End-to-end open-source platform for machine learning enabling complex computations with data flow graphs. Widely used for deep learning applications with robust production support.

Free

4.8

Details Visit

Featured

PyTorch

Deep Learning Framework

Open-source machine learning library known for its flexibility, ease of use, and as a preferred tool for research in deep learning and artificial intelligence. Dynamic computation graphs.

Free

4.8

Details Visit

Keras

High-Level Neural Networks API

High-level neural networks API designed for fast experimentation with deep neural networks. Runs on top of TensorFlow offering a user-friendly interface for building models.

Free

4.7

Details Visit

Featured

XGBoost

Extreme Gradient Boosting

Highly efficient implementation of gradient boosting frameworks designed for speed and performance. Widely used in machine learning competitions and practical applications for structured data.

Free

4.8

Details Visit

LightGBM

Light Gradient Boosting Machine

Gradient boosting framework using tree-based learning algorithms. Designed for speed and efficiency, supporting large datasets and distributed computing for various ML tasks.

Free

4.7

Details Visit

CatBoost

Gradient Boosting on Decision Trees

Algorithm for gradient boosting on decision trees developed by Yandex. Particularly effective for datasets with categorical features, known for robustness and handling overfitting well.

Free

4.6

Details Visit

Featured

Boto3

AWS SDK for Python

The official Amazon Web Services (AWS) SDK for Python. Enables Python developers to write software that makes use of services like Amazon S3, EC2, Lambda, and more. Provides easy-to-use, object-oriented API as well as low-level access to AWS services, making it simple to integrate Python applications with AWS infrastructure.

Free

4.8

Details Visit

Featured

Google Cloud Client Libraries

GCP SDK for Python

Google Cloud Platform's official client library for Python, enabling seamless integration with GCP services like Compute Engine, Cloud Storage, BigQuery, and Pub/Sub. Designed for a Pythonic, intuitive experience when interacting with Google Cloud services, with idiomatic code patterns and comprehensive documentation.

Free

4.7

Details Visit

Azure SDK for Python

Microsoft Azure SDK

Microsoft's comprehensive Azure SDK for Python offering a complete set of packages to interact with Azure resources and services. Supports wide range of Azure services including Virtual Machines, Storage, Databases, AI services, and more. Provides tools for effective resource management and service interaction within Azure ecosystem.

Free

4.6

Details Visit

IBM Cloud Python SDK

IBM Cloud Services SDK

Official SDK for interacting with various IBM Cloud services programmatically. Provides comprehensive support for IBM Cloud services including CIS, DNS, IAM, VPC, Watson AI, and more. Enables management and automation of IBM Cloud resources with Python, compatible with Python 3.6 and above.

Free

4.3

Details Visit

Oracle Cloud Infrastructure SDK

OCI SDK for Python

Official SDK for writing code to manage Oracle Cloud Infrastructure resources. Supports wide range of Oracle Cloud services with functionalities for compute, storage, networking, databases, and more. Available across multiple operating systems and Python versions, providing robust interface for OCI resource management.

Free

4.4

Details Visit

ERAlchemy

ER Diagrams from SQLAlchemy

Python library designed to create Entity Relationship diagrams by extracting data from databases or SQLAlchemy models. Particularly useful for database designers and developers who need to visualize and interpret complex relationships within database systems. Generates diagrams automatically from your Python code.

Free

4.2

Details Visit

Featured

Amundsen

Data Discovery & Metadata Engine

Data discovery and metadata engine for improving productivity of data analysts, scientists, and engineers when interacting with data. Provides powerful search, data previews, and column-level lineage. Integrates seamlessly with Python environments and modern data stacks for comprehensive metadata management.

Free

4.5

Details Visit

CKAN

Open Data Management System

Powerful data management system that makes data accessible by providing tools to streamline publishing, sharing, finding, and using data. Aimed at data publishers wanting to make their data open and available. Features data cataloging, API generation, and visualization capabilities.

Free

4.1

Details Visit

Marquez

Metadata Service for Data Lineage

Open-source metadata service for collection, aggregation, and visualization of data ecosystem metadata. Provides common interface to track data lineage across your entire data platform. Offers Python client for integration and supports OpenLineage standard for lineage collection.

Free

4.3

Details Visit

Featured

DataHub

Modern Metadata Platform

Open-source metadata platform for the modern data stack. Provides powerful and flexible metadata search, discovery, and lineage capabilities. Features real-time metadata updates, data quality monitoring, and governance workflows. Extensive Python SDK for automation and integration.

Free

4.6

Details Visit

Featured

Pandas

Data Analysis & Manipulation

Foundational library for data manipulation and analysis in Python. Provides fast, flexible, and expressive data structures (DataFrames) designed for working with structured, tabular, and time series data. Essential tool for data wrangling with comprehensive features for indexing, grouping, merging, and filtering.

Free

4.9

Details Visit

ORM (encode/orm)

Lightweight Async ORM

Lightweight and async-ready ORM designed to work with FastAPI and Starlette. Particularly suited for applications requiring asynchronous database operations with minimal overhead and modern Python async/await patterns.

Free

4.3

Details Visit

Featured

Python

Programming Language

Python is a high-level, interpreted programming language that has become the dominant language for data engineering. Known for its clear syntax, extensive standard library, and rich ecosystem of data-focused packages. Essential foundation for all Python data engineering work.

Free

4.9

Details Visit

pip - getting-started tool for Python data engineering

pip

Python Package Installer

The standard package installer for Python. Used to install and manage Python packages from the Python Package Index (PyPI) and other repositories. Essential tool for managing dependencies in any Python project, comes bundled with Python installations.

Free

4.7

Details Visit

virtualenv / venv

Virtual Environment Manager

Tools for creating isolated Python environments, allowing you to manage project-specific dependencies without conflicts. venv comes built into Python 3, while virtualenv offers additional features. Critical for professional Python development and maintaining clean, reproducible environments.

Free

4.6

Details Visit

Tools (75)

Featured

Pandas

Data Manipulation & Analysis Library

Free

4.9

Details Visit

Petl

Python ETL Package

Python package specifically designed for ETL tasks, offering tools for data extraction, transformation, and loading. Suitable for simpler, script-based ETL processes.

Free

4.3

Details Visit

Featured

PySpark

Python API for Apache Spark

Python API for Apache Spark, enabling scalable and efficient data processing. Particularly useful for ETL processes involving large datasets that need parallel processing across a cluster.

Free

4.8

Details Visit

DLT (Data Load Tool)

Python Data Loading Library

Python library that facilitates the loading phase in ETL processes. Designed to simplify loading data into various data stores or processing systems.

Free

4.5

Details Visit

Bonobo

Lightweight ETL Framework

Lightweight Extract-Transform-Load (ETL) framework for Python 3.6+. Allows writing ETL scripts in pure Python, particularly suited for simple and straightforward ETL tasks.

Free

4.2

Details Visit

Mage.AI

Data Pipeline Tool

Modern data pipeline tool focused on automating data preparation and feature engineering for machine learning. Streamlines the data transformation process in ETL workflows.

Freemium

4.6

Details Visit

Featured

Apache Airflow

Workflow Orchestration Platform

Platform to programmatically author, schedule, and monitor workflows. Allows for complex pipeline construction and efficient task management with robust dependency handling.

Free

4.8

Details Visit

Luigi

Batch Job Pipeline Builder

Developed by Spotify, Luigi helps build complex pipelines of batch jobs, handling dependency resolution, workflow management, and task visualization.

Free

4.4

Details Visit

Featured

Prefect

Modern Workflow Orchestration

Workflow management system designed for modern infrastructure, with a focus on simplicity, ease of use, and flexibility in defining and executing workflows.

Freemium

4.7

Details Visit

Featured

Dagster

Data Orchestrator for ML & Analytics

Open-source data orchestrator for machine learning, analytics, and ETL. Focuses on development, production, and observation of data assets with integrated pipeline views.

Freemium

4.7

Details Visit

Dask

Parallel Computing Library

Parallel computing library that scales Pandas workflows to larger-than-memory datasets. Enables parallel processing while maintaining a familiar Pandas-like interface for big data.

Free

4.6

Details Visit

Featured

NumPy

Numerical Computing Library

Fundamental library for numerical computing in Python. Supports large multi-dimensional arrays and matrices with a vast collection of mathematical functions for array operations.

Free

4.9

Details Visit

Beautiful Soup

Web Scraping & HTML Parsing

Library for web scraping and parsing HTML/XML documents. Extensively used in data wrangling to clean, parse, and extract data from web sources.

Free

4.5

Details Visit

Scrapy

Web Crawling Framework

Powerful web crawling and scraping framework for extracting, cleaning, and processing large volumes of web data. Essential for data wrangling from web sources.

Free

4.6

Details Visit

TextBlob

Text Processing Library

Simple library for processing textual data with APIs for common NLP tasks. Essential for data wrangling when dealing with text data and natural language processing.

Free

4.3

Details Visit

Featured

Pydantic

Data Validation using Type Hints

Data validation and settings management library using Python type annotations. Ensures data conforms to defined schemas with Python's typing module, perfect for FastAPI and modern Python apps.

Free

4.9

Details Visit

Featured

Marshmallow

Object Serialization & Validation

ORM/ODM/framework-agnostic library for object serialization and deserialization. Converts complex data types to and from native Python datatypes with robust validation.

Free

4.7

Details Visit

Cerberus

Lightweight Data Validation

Lightweight and extensible data validation library supporting complex data structures with customizable validation rules. Highly flexible for various validation needs.

Free

4.5

Details Visit

Voluptuous

Python Data Structure Validation

Validates Python data structures with straightforward syntax and clear error messages. Ensures structure and content adhere to specified schemas.

Free

4.3

Details Visit

jsonschema

JSON Schema Validator

Library for validating JSON data against JSON Schema standards. Essential when working with JSON data formats to ensure schema compliance.

Free

4.6

Details Visit

Featured

Pandera

DataFrame Validation

Flexible API for data validation on dataframe structures. Validates dataframes in real-time, integrates with pydantic and fastapi. Essential for production data pipelines.

Free

4.7

Details Visit

Validr

Fast Validation Library

Fast, simple, and powerful validation library with declarative validation rules. Optimized for performance when validating data from various sources.

Free

4.2

Details Visit

Featured

SQLAlchemy

Python SQL Toolkit & ORM

Widely used ORM library providing a full suite of enterprise-level persistence patterns. Designed for efficient, high-performing database access with flexible SQL abstraction.

Free

4.9

Details Visit

Featured

Django ORM

Django's Built-in ORM

Part of Django web framework, allows defining data models entirely in Python. Provides powerful abstraction layer to translate Python code to SQL seamlessly.

Free

4.8

Details Visit

Peewee

Small Expressive ORM

Small, expressive ORM with simple and intuitive interface. Lightweight and easy to use, perfect for small to medium-sized applications prioritizing simplicity.

Free

4.6

Details Visit

Pony ORM

Pythonic Query Language

Unique ORM using generator expressions for queries. Intuitive and user-friendly, allowing complex queries in pure Python that mirror human language.

Free

4.5

Details Visit

SQLObject

Object Interface to Database

Popular ORM providing object-oriented interface with tables as classes and rows as instances. Supports variety of database backends with simplicity focus.

Free

4.2

Details Visit

Tortoise ORM

Async ORM for Python

Easy-to-use asyncio ORM inspired by Django. Designed for async/await syntax, making it perfect for asynchronous applications and modern Python development.

Free

4.6

Details Visit

Gino

Async SQLAlchemy ORM

Async ORM built on SQLAlchemy core for asyncio programming. Provides simple and intuitive API for asynchronous database interactions with high performance.

Free

4.4

Details Visit

Featured

Alembic

Database Migrations for SQLAlchemy

Free

4.7

Details Visit

Featured

Django Migrations

Built-in Django Migration Framework

Django's powerful built-in migration framework that comes bundled with Django. Allows you to change your database schema without losing data using a simple and intuitive API.

Free

4.8

Details Visit

Flask-Migrate

Database Migrations for Flask

Extension that handles SQLAlchemy database migrations for Flask applications using Alembic. Provides command-line tools to manage and automate database migrations in Flask projects.

Free

4.5

Details Visit

yoyo-migrations

Database Schema Migration Tool

Free

4.3

Details Visit

SQLAlchemy-Migrate

Schema Versioning for SQLAlchemy

Provides a way to deal with database schema changes in SQLAlchemy projects. Extends SQLAlchemy to have database schema versioning and migration capabilities for managing database evolution.

Free

4.2

Details Visit

South

Legacy Django Migrations

The original migration tool for Django before built-in migrations were added in Django 1.7. Still relevant for maintaining or upgrading legacy Django applications running older versions.

Free

Details Visit

Faust

Python Stream Processing

Stream processing library porting ideas from Kafka Streams to Python. Used for building high-performance and reliable real-time stream processing applications with Pythonic API.

Free

4.5

Details Visit

Featured

Flask

Lightweight Web Framework

Lightweight WSGI web application framework easy to get started with and versatile for complex applications. Popular for building web APIs thanks to simplicity and extensibility.

Free

4.8

Details Visit

Featured

Django REST Framework

Powerful API Toolkit for Django

Powerful and flexible toolkit for building Web APIs in Django. Highly recommended for adding API capabilities to Django applications with comprehensive features and excellent documentation.

Free

4.9

Details Visit

Featured

FastAPI

Modern High-Performance Framework

Modern, fast (high-performance) web framework for building APIs with Python 3.7+ based on standard type hints. Features automatic API documentation, easy to use, and blazing fast execution.

Free

4.9

Details Visit

Tornado

Asynchronous Networking Library

Python web framework and asynchronous networking library. Particularly useful for long-polling, WebSockets, and applications requiring long-lived connections to each user.

Free

4.5

Details Visit

Falcon

High-Performance Python Framework

Reliable, high-performance Python framework for building large-scale app backends and microservices. Encourages REST architectural style while remaining highly effective and minimalist.

Free

4.6

Details Visit

Featured

Great Expectations

Data Validation & Documentation

Comprehensive tool helping data teams validate, document, and profile their data. Define expectations for your data ensuring it meets quality standards before processing.

Free / Paid

4.7

Details Visit

Ydata Profiling

Automated Data Profiling

Generates profile reports from pandas DataFrames. Excellent tool for quickly understanding data with interactive HTML reports including statistics, distributions, and correlations.

Free

4.6

Details Visit

PyDeequ

Data Quality for Big Data

Python API for Deequ, AWS library built on Apache Spark for defining and verifying data quality constraints. Useful for large-scale data processing and quality verification.

Free

4.5

Details Visit

Dedupe

ML-Powered Deduplication

Python library using machine learning to perform deduplication and entity resolution on structured data. Particularly useful for identifying and merging duplicate records.

Free

4.4

Details Visit

Soda Core

Data Quality Testing

Open-source data quality tool with CLI for defining, running, and monitoring data quality checks. Write tests to verify data meets conditions like missing values, ranges, or uniqueness.

Free / Paid

4.6

Details Visit

DataCleaner

Automated Data Cleaning

Automatic tool for cleaning and preprocessing data. Handles missing values, encodes categorical data, and scales features making data preparation efficient.

Free

4.2

Details Visit

Data Linter

Schema Validation Tool

Python package for automated data validation within Data Engineering pipelines. Engineered to ingest and validate tabular data against predefined schemas.

Free

4.1

Details Visit

Featured

Matplotlib

Comprehensive Visualization Library

Free

4.8

Details Visit

Featured

Seaborn

Statistical Data Visualization

Free

4.7

Details Visit

Featured

Plotly

Interactive Visualization Library

Plotly offers a range of interactive plotting options and is known for its advanced graphics and interactivity, supporting complex visualizations with ease. Perfect for creating web-based dashboards.

Free / Paid

4.8

Details Visit

Bokeh

Interactive Web Visualizations

Bokeh focuses on building interactive, web-ready plots, which can be a powerful tool for creating dynamic visualizations that can easily be embedded in web applications.

Free

4.6

Details Visit

Altair

Declarative Visualization

Altair is a declarative statistical visualization library for Python, offering a simple and concise way to create a wide range of statistical plots based on a logical data mapping.

Free

4.5

Details Visit

Featured

Scikit-learn

Machine Learning in Python

Versatile library providing a range of supervised and unsupervised learning algorithms. Known for its ease of use and efficiency for data mining and data analysis with classical ML algorithms.

Free

4.9

Details Visit

Featured

TensorFlow

End-to-End ML Platform

End-to-end open-source platform for machine learning enabling complex computations with data flow graphs. Widely used for deep learning applications with robust production support.

Free

4.8

Details Visit

Featured

PyTorch

Deep Learning Framework

Open-source machine learning library known for its flexibility, ease of use, and as a preferred tool for research in deep learning and artificial intelligence. Dynamic computation graphs.

Free

4.8

Details Visit

Keras

High-Level Neural Networks API

High-level neural networks API designed for fast experimentation with deep neural networks. Runs on top of TensorFlow offering a user-friendly interface for building models.

Free

4.7

Details Visit

Featured

XGBoost

Extreme Gradient Boosting

Highly efficient implementation of gradient boosting frameworks designed for speed and performance. Widely used in machine learning competitions and practical applications for structured data.

Free

4.8

Details Visit

LightGBM

Light Gradient Boosting Machine

Gradient boosting framework using tree-based learning algorithms. Designed for speed and efficiency, supporting large datasets and distributed computing for various ML tasks.

Free

4.7

Details Visit

CatBoost

Gradient Boosting on Decision Trees

Algorithm for gradient boosting on decision trees developed by Yandex. Particularly effective for datasets with categorical features, known for robustness and handling overfitting well.

Free

4.6

Details Visit

Featured