Serialization Formats
Columnar Storage Format
★ 4.8
Cross-Language Services Framework
★ 4.0
pip install pyarrowpip install thriftpip install pyarrowpip install thriftParquet is the standard output format for Python data pipelines writing to a data lake. Engineers use `pandas.to_parquet()` or `pyarrow.parquet.write_table()` to write DataFrames as efficiently compressed columnar files. Reading is equally simple — `pd.read_parquet('s3://bucket/prefix/')` reads an entire partitioned dataset, with DuckDB and Athena capable of querying Parquet files directly without loading.
Python data engineers encounter Apache Thrift when working with systems like Apache Parquet, HBase, and Cassandra, which use Thrift internally for data serialisation and RPC. The thrift Python library enables engineers to call Thrift-based services from Python pipelines. Thrift is also used in microservice architectures where Python services need to communicate with services written in Java, Go, or C++ via a strongly-typed interface.
Individual Tool Pages