The GBD study offers datasets on disease burden, mortality, morbidity, risk factors and health-related indicators globally, regionally and by country.
GBD data is available via the IHME GBD Results Tool (CSV downloads) and the GBD API. Engineers use `pandas.read_csv()` on large result files, then `groupby` and `pivot_table` to summarize burden by cause, location, age, and sex for comparative health analysis.
GBD study data is foundational for health AI — it provides the ground truth for disease burden that trains ML models predicting health outcomes. RAG systems indexed on GBD results enable AI health tools to answer 'What is the leading cause of disability in low-income countries?' with IHME-verified burden estimates.
# pip install pandas
import pandas as pd
# Download GBD results CSV from https://vizhub.healthdata.org/gbd-results/
df = pd.read_csv("IHME-GBD_2021_DATA.csv",
usecols=["measure_name", "location_name", "cause_name",
"sex_name", "age_name", "year", "val", "upper", "lower"])
# Top causes of death globally in 2021
deaths = (df[(df["measure_name"] == "Deaths") &
(df["location_name"] == "Global") &
(df["year"] == 2021) &
(df["sex_name"] == "Both") &
(df["age_name"] == "All Ages")]
.nlargest(10, "val")[["cause_name", "val"]])
print(deaths)Official dataset source
More datasets used by Python data engineers.
The WHO Global Health Observatory offers datasets on a wide range of health-related indicators, including disease prevalence, mortality rates, healthcare access and more.
The World Bank World Development Indicators provides 1,600+ time-series indicators covering poverty, health, education, infrastructure, and environment for 217 countries from 1960 onwards. Used in data engineering for global development dashboards, longitudinal analysis pipelines, and economic research systems in Python.
New York City's open data portal provides 3,000+ datasets covering taxi trips, 311 complaints, crime statistics, building permits, health inspections, and transit data. Used in urban data engineering pipelines for city analytics, transportation modelling, and building geospatial dashboards in Python.