CDC WONDER provides access to US public health datasets including mortality records, natality data, cancer statistics, vaccination rates, and disease surveillance. Used in data engineering for public health analytics pipelines, epidemiological research systems, and building population health indicator dashboards in Python.
CDC WONDER provides a web-based query interface. Engineers extract data as tab-delimited text files and load with `pandas.read_csv(sep='\t')`. For automation, the CDC Wide-format dataset downloads from data.cdc.gov use the Socrata API accessible via `sodapy`.
CDC mortality and disease data trains AI public health surveillance models. Build RAG systems indexed on CDC Wonder mortality reports so LLMs can answer 'What are the leading causes of death among 25-44 year olds in the US?' with official epidemiological data rather than outdated training knowledge.
# pip install pandas requests
import pandas as pd
# CDC Wonder data via data.cdc.gov Socrata API
from sodapy import Socrata
client = Socrata("data.cdc.gov", "YOUR_APP_TOKEN")
results = client.get("bi63-dtpu", limit=1000, # Underlying Cause of Death
select="cause_of_death,deaths,year",
where="year='2022'")
df = pd.DataFrame.from_records(results)
df["deaths"] = pd.to_numeric(df["deaths"])
print(df.nlargest(10, "deaths"))Official dataset source
More datasets used by Python data engineers.
Access demographic, economic, social, and geographic datasets from the US Census Bureau including the American Community Survey, decennial census, and economic census. Used in data engineering for population analysis pipelines, market research, geospatial enrichment, and building socioeconomic dashboards in Python.
The National Library of Medicine hosts PubMed, MedlinePlus, GenBank, and other biomedical databases covering clinical literature, genetic sequences, drug information, and medical terminology. Used in healthcare data engineering pipelines, clinical NLP workflows, and biomedical research ingestion in Python.
The WHO Global Health Observatory offers datasets on a wide range of health-related indicators, including disease prevalence, mortality rates, healthcare access and more.