Home Tools Compare Projects Datasets Categories Blog ContactStart Learning

Loading...

PyDataEng

Free Python data engineering directory with 400+ resources.

Home Tools Compare Projects Datasets Categories Blog Contact

© 2026 Python Data Engineering. All rights reserved.

Back to Tools Back to Datasets

Columnar Tools & Datasets for Python Data Engineering

Discover 11 tools tagged with Columnar for Python data engineering.

Columnar storage formats and databases organise data by column rather than row, dramatically improving compression and analytical query performance. Python data engineers use columnar formats like Parquet and ORC for data lake storage, and columnar databases like Redshift and BigQuery for warehouse analytics, accessed via Pandas, PyArrow, and SQLAlchemy.

Tools (11)

Apache HBase - databases-warehouses tool for Python data engineering

Apache HBase

Distributed Column-Family Store

A distributed, scalable big data store modeled after Google's Bigtable, running on top of HDFS. HBase provides random, real-time read/write access to large datasets and is commonly used for storing sparse data in the Hadoop ecosystem.

Free

◆4.2

ScyllaDB - databases-warehouses tool for Python data engineering

ScyllaDB

High-Performance Cassandra Alternative

A NoSQL database compatible with Apache Cassandra but built in C++ for significantly higher throughput and lower latency. ScyllaDB is designed for data-intensive applications requiring consistent single-digit millisecond performance at scale.

Freemium

◆4.5

ClickHouse - databases-warehouses tool for Python data engineering

Featured

ClickHouse

Fast Columnar OLAP Database

An open-source columnar database management system designed for online analytical processing (OLAP). ClickHouse delivers exceptional query performance on large datasets, making it ideal for real-time analytics, log analysis, and time-series data.

Freemium

◆4.7

Vertica - databases-warehouses tool for Python data engineering

Vertica

Enterprise Columnar Analytics Database

A distributed, MPP columnar database designed for large-scale analytics workloads. Vertica offers extensive analytics SQL, machine learning capabilities, and high compression ratios for efficient storage of massive datasets.

$$$

◆4.3

FiloDB - databases-warehouses tool for Python data engineering

FiloDB

Distributed Columnar Streaming Database

A distributed, columnar, versioned, and streaming database designed for real-time and batch analytics. FiloDB combines the benefits of columnar storage with streaming ingestion, making it suitable for time-series and event data workloads.

Free

◆3.7

QuestDB - databases-warehouses tool for Python data engineering

QuestDB

Fast SQL Time Series Database

A relational column-oriented database designed for real-time analytics on time series and event data. QuestDB uses SQL with time-series extensions and delivers exceptional ingestion performance, ideal for financial data, IoT, and application metrics.

Freemium

◆4.5

DuckDB - databases-warehouses tool for Python data engineering

Featured

DuckDB

In-Process Analytical Database

A fast, in-process analytical database with zero external dependencies. DuckDB is designed for analytical query workloads and integrates seamlessly with Python and Pandas, making it ideal for local data analysis and embedded analytics.

Free

◆4.8

Apache Druid - databases-warehouses tool for Python data engineering

Apache Druid

Real-Time Analytics Database

A column-oriented, distributed data store designed for sub-second OLAP queries on event data. Druid is used for powering interactive analytical applications, real-time dashboards, and exploratory analytics on high-cardinality data.

Free

◆4.3

Polars

Fast DataFrame library for Python and Rust

Polars is a high-performance DataFrame library written in Rust with Python bindings. It uses a lazy execution engine and multi-threaded processing to deliver significantly faster query performance than pandas on a single machine, with lower memory overhead.

Free

◆4.8

Apache Parquet - serialization-formats tool for Python data engineering

Featured

Apache Parquet

Columnar Storage Format

A columnar storage format available to any project in the Hadoop ecosystem. Parquet provides efficient compression and encoding schemes, making it the de facto standard for analytical workloads in data lakes and warehouses.

Free

◆4.8

Apache ORC - serialization-formats tool for Python data engineering

Apache ORC

Optimized Row Columnar Format

The smallest, fastest columnar storage format for Hadoop workloads. ORC provides highly efficient compression, predicate pushdown, and ACID transaction support, making it ideal for Hive-based data warehousing.

Free

◆4.3