Serialization Formats
Optimized Row Columnar Format
★ 4.3
Cross-Language Services Framework
★ 4.0
pip install pyorcpip install thriftpip install pyorcpip install thriftPython data engineers use `pyorc` to read and write ORC files when working with Hive-based data lake environments where ORC is the standard format. In PySpark pipelines, ORC is specified as the write format for tables that will be queried via HiveQL with ACID upsert support — Spark handles ORC read/write transparently via the DataFrame API.
Python data engineers encounter Apache Thrift when working with systems like Apache Parquet, HBase, and Cassandra, which use Thrift internally for data serialisation and RPC. The thrift Python library enables engineers to call Thrift-based services from Python pipelines. Thrift is also used in microservice architectures where Python services need to communicate with services written in Java, Go, or C++ via a strongly-typed interface.
Individual Tool Pages