Serialization Formats
Fast JVM Serialization Framework
★ 4.1
Google's Data Interchange Format
★ 4.7
N/A — Java librarypip install protobufN/A — Java librarypip install protobufPython data engineers encounter Kryo when tuning PySpark job performance — enabling Kryo serialization in Spark config (`spark.serializer=org.apache.spark.serializer.KryoSerializer`) reduces shuffle data size and speeds up operations that cross network boundaries between Spark executors. PySpark's Python UDFs still use pickle for Python objects, but JVM-side data uses Kryo.
Python data engineers use `protobuf` (the `google.protobuf` package) to serialize and deserialize structured messages in Kafka topics and gRPC services. Proto schemas define the contract between Python producers and consumers — `protoc` generates Python classes from `.proto` files, and engineers call `.SerializeToString()` and `ParseFromString()` to encode and decode messages efficiently.
Individual Tool Pages