The smallest, fastest columnar storage format for Hadoop workloads. ORC provides highly efficient compression, predicate pushdown, and ACID transaction support, making it ideal for Hive-based data warehousing.
Python data engineers use `pyorc` to read and write ORC files when working with Hive-based data lake environments where ORC is the standard format. In PySpark pipelines, ORC is specified as the write format for tables that will be queried via HiveQL with ACID upsert support — Spark handles ORC read/write transparently via the DataFrame API.
The smallest, fastest columnar storage format for Hadoop workloads. ORC provides highly efficient compression, predicate pushdown, and ACID transaction support, making it ideal for Hive-based data warehousing.
Yes, Apache ORC is free to use.
Apache ORC is listed under the Serialization Formats category on Python Data Engineering.
Details
Related
| Tool | Pricing | Rating | |
|---|---|---|---|
AA Apache Avrofeatured Schema-Based Data Serialization | Free | ★ 4.5 | → |
AP Apache Parquetfeatured Columnar Storage Format | Free | ★ 4.8 | → |
KR Kryo Fast JVM Serialization Framework | Free | ★ 4.1 | → |