// serialization-formats
Columnar Storage Format
A columnar storage format available to any project in the Hadoop ecosystem. Parquet provides efficient compression and encoding schemes, making it the de facto standard for analytical workloads in data lakes and warehouses.
Parquet is the standard output format for Python data pipelines writing to a data lake. Engineers use `pandas.to_parquet()` or `pyarrow.parquet.write_table()` to write DataFrames as efficiently compressed columnar files. Reading is equally simple — `pd.read_parquet('s3://bucket/prefix/')` reads an entire partitioned dataset, with DuckDB and Athena capable of querying Parquet files directly without loading.
A columnar storage format available to any project in the Hadoop ecosystem. Parquet provides efficient compression and encoding schemes, making it the de facto standard for analytical workloads in data lakes and warehouses.
Yes, Apache Parquet is free to use.
Apache Parquet is listed under the Serialization Formats category on Python Data Engineering.
Details
Related
| Tool | Pricing | Rating | |
|---|---|---|---|
AO Apache ORC Optimized Row Columnar Format | Free | ★ 4.3 | → |
PB Protocol Buffersfeatured Google's Data Interchange Format | Free | ★ 4.7 | → |
DU DuckDBfeaturednew In-Process Analytical Database | Free | ★ 4.8 | → |