Libraries for analyzing, profiling, and understanding dataset characteristics.
Data profiling tools automatically analyze datasets to discover their structure, content patterns, and quality characteristics. These tools generate statistical summaries, detect data types, identify missing values, find outliers, and uncover relationships between columns. Data profiling is an essential first step in any data engineering project, helping teams understand unfamiliar datasets, validate assumptions about data quality, and detect sensitive information like PII before it enters production pipelines.
Sensitive Data Detection & Profiling
A Python library by Capital One designed to make data analysis, monitoring, and sensitive data detection easy. Data Profiler automatically identifies data types, statistical patterns, and PII across structured and unstructured datasets.
Advanced Data Pattern Discovery
An open-source data profiler focused on discovery and validation of complex patterns in data. Desbordante finds functional dependencies, association rules, and other data constraints that go beyond basic statistical profiling.