A Python library by Capital One that facilitates the comparison of two DataFrames across Pandas, Polars, Spark, and more. datacompy provides detailed match reports with configurable tolerance levels, ideal for validating data pipeline outputs.
Python data engineers use DataComPy to validate ETL migrations and pipeline refactors — comparing the output of a new pipeline version against the old output to confirm they produce identical results. A common use case is a database migration test: load the same source data through both the old and new ingestion code, compare the resulting DataFrames with DataComPy, and block the migration until all differences are resolved.
A Python library by Capital One that facilitates the comparison of two DataFrames across Pandas, Polars, Spark, and more. datacompy provides detailed match reports with configurable tolerance levels, ideal for validating data pipeline outputs.
Yes, datacompy is free to use.
datacompy is listed under the Data Comparison category on Python Data Engineering.
Details
Related
| Tool | Pricing | Rating | |
|---|---|---|---|
BO Bonobo Lightweight ETL Framework | Free | ★ 4.2 | → |
NU NumPyfeatured Numerical Computing Library | Free | ★ 4.9 | → |
BS Beautiful Soup Web Scraping & HTML Parsing | Free | ★ 4.5 | → |