Discover 10 tools tagged with Columnar for Python data engineering.
Distributed Column-Family Store
A distributed, scalable big data store modeled after Google's Bigtable, running on top of HDFS. HBase provides random, real-time read/write access to large datasets and is commonly used for storing sparse data in the Hadoop ecosystem.
High-Performance Cassandra Alternative
A NoSQL database compatible with Apache Cassandra but built in C++ for significantly higher throughput and lower latency. ScyllaDB is designed for data-intensive applications requiring consistent single-digit millisecond performance at scale.
Fast Columnar OLAP Database
An open-source columnar database management system designed for online analytical processing (OLAP). ClickHouse delivers exceptional query performance on large datasets, making it ideal for real-time analytics, log analysis, and time-series data.
Distributed Columnar Streaming Database
A distributed, columnar, versioned, and streaming database designed for real-time and batch analytics. FiloDB combines the benefits of columnar storage with streaming ingestion, making it suitable for time-series and event data workloads.
Fast SQL Time Series Database
A relational column-oriented database designed for real-time analytics on time series and event data. QuestDB uses SQL with time-series extensions and delivers exceptional ingestion performance, ideal for financial data, IoT, and application metrics.
Real-Time Analytics Database
A column-oriented, distributed data store designed for sub-second OLAP queries on event data. Druid is used for powering interactive analytical applications, real-time dashboards, and exploratory analytics on high-cardinality data.
Columnar Storage Format
A columnar storage format available to any project in the Hadoop ecosystem. Parquet provides efficient compression and encoding schemes, making it the de facto standard for analytical workloads in data lakes and warehouses.
Optimized Row Columnar Format
The smallest, fastest columnar storage format for Hadoop workloads. ORC provides highly efficient compression, predicate pushdown, and ACID transaction support, making it ideal for Hive-based data warehousing.