// serialization-formats

Apache ORC

Optimized Row Columnar Format

About Apache ORC

The smallest, fastest columnar storage format for Hadoop workloads. ORC provides highly efficient compression, predicate pushdown, and ACID transaction support, making it ideal for Hive-based data warehousing.

Key Features

1Columnar storage format optimized for Hive and Hadoop workloads
2Built-in lightweight indexes (min/max, bloom filter) for predicate pushdown
3ACID transaction support in Hive with ORC as the backing format
4Stripe-based file structure with built-in statistics for query planning
5Native support in Hive, Spark, and Presto for data lake analytics

How Python Data Engineers Use Apache ORC

Python data engineers use `pyorc` to read and write ORC files when working with Hive-based data lake environments where ORC is the standard format. In PySpark pipelines, ORC is specified as the write format for tables that will be queried via HiveQL with ACID upsert support — Spark handles ORC read/write transparently via the DataFrame API.

Frequently Asked Questions

What is Apache ORC used for?▾

Is Apache ORC free to use?▾

Yes, Apache ORC is free to use.

What category does Apache ORC belong to?▾

Apache ORC is listed under the Serialization Formats category on Python Data Engineering.

Verified Listing

Visit Website

// contains affiliate links

Details

Similar Serialization Formats Tools

3 tools

Tool	Pricing	Rating
AA Apache Avrofeatured Schema-Based Data Serialization	Free	★ 4.5	→
AP Apache Parquetfeatured Columnar Storage Format	Free	★ 4.8	→
KR Kryo Fast JVM Serialization Framework	Free	★ 4.1	→