Serialization Formats
Schema-Based Data Serialization
★ 4.5
Fast JVM Serialization Framework
★ 4.1
pip install avro-python3N/A — Java librarypip install avro-python3N/A — Java libraryPython data engineers use `fastavro` to serialize and deserialize Avro records in Kafka-based pipelines. Schema Registry integration means Python producers validate records against the registered schema before publishing, and consumers deserialize binary Avro messages back to Python dicts automatically. Avro's compact binary encoding reduces Kafka topic storage costs compared to JSON.
Python data engineers encounter Kryo when tuning PySpark job performance — enabling Kryo serialization in Spark config (`spark.serializer=org.apache.spark.serializer.KryoSerializer`) reduces shuffle data size and speeds up operations that cross network boundaries between Spark executors. PySpark's Python UDFs still use pickle for Python objects, but JVM-side data uses Kryo.
Individual Tool Pages