Serialization Tools & Datasets for Python Data Engineering

Discover 6 tools tagged with Serialization for Python data engineering.

Data serialisation formats define how structured data is encoded for storage or transmission between systems. Python data engineers work with serialisation formats including JSON, Avro, Parquet, Protocol Buffers, and MessagePack to optimise pipeline throughput, schema enforcement, and cross-language compatibility in distributed systems.

Tools (6)

Featured

Apache Avro

Schema-Based Data Serialization

A data serialization system that provides rich data structures, a compact binary format, and schema evolution support. Avro is widely used in Apache Kafka ecosystems for encoding messages with schema registry integration.

Free

◆4.5

Details Visit

Featured

Apache Parquet

Columnar Storage Format

A columnar storage format available to any project in the Hadoop ecosystem. Parquet provides efficient compression and encoding schemes, making it the de facto standard for analytical workloads in data lakes and warehouses.

Free

◆4.8

Details Visit

Apache ORC

Optimized Row Columnar Format

The smallest, fastest columnar storage format for Hadoop workloads. ORC provides highly efficient compression, predicate pushdown, and ACID transaction support, making it ideal for Hive-based data warehousing.

Free

◆4.3

Details Visit

Apache Thrift

Cross-Language Services Framework

A software framework for scalable cross-language services development. Thrift combines a serialization format with an RPC framework, enabling efficient communication between services written in different programming languages.

Free

◆4

Details Visit

Featured

Protocol Buffers

Google's Data Interchange Format

Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data. Protocol Buffers provide a compact binary format with strong typing and schema evolution, widely used in gRPC and high-performance data systems.

Free

◆4.7

Details Visit

Kryo

Fast JVM Serialization Framework

A fast and efficient object graph serialization framework for Java. Kryo is commonly used as the serialization backend for Apache Spark and other JVM-based data processing frameworks for high-performance data exchange.

Free

◆4.1

Details Visit

Serialization Tools & Datasets for Python Data Engineering

Discover 6 tools tagged with Serialization for Python data engineering.

Tools (6)