Operated by the Institute for Health Metrics and Evaluation (IHME), provides access to a comprehensive collection of global health data and indicators.
GHDx data is downloadable as CSV from the IHME results tool. Engineers use `pandas.read_csv()` to load GBD results files, which typically have measure, location, year, age, sex, and cause columns requiring `groupby` and `pivot_table` for analysis.
GHDx's Global Burden of Disease data trains AI models for health outcome prediction and disease priority modeling. Build RAG knowledge bases indexed on GBD study results so LLMs can answer 'What is the leading cause of premature death in sub-Saharan Africa?' with IHME-verified statistics.
# pip install pandas
import pandas as pd
# Download GBD results from IHME GHDx results tool
# https://vizhub.healthdata.org/gbd-results/
df = pd.read_csv("IHME-GBD_2021_DATA.csv")
df_top = (df[df["measure"] == "Deaths"]
.groupby("cause")["val"]
.sum()
.sort_values(ascending=False)
.head(10))
print(df_top)Official dataset source
More datasets used by Python data engineers.
Access datasets on child well-being, education enrolment, nutrition, immunisation, child mortality, and child protection indicators worldwide from UNICEF. Used in data engineering for humanitarian analytics pipelines, SDG progress tracking, and building global child health indicator dashboards in Python.
New York City's open data portal provides 3,000+ datasets covering taxi trips, 311 complaints, crime statistics, building permits, health inspections, and transit data. Used in urban data engineering pipelines for city analytics, transportation modelling, and building geospatial dashboards in Python.
Gapminder provides clean, long-run historical datasets on 500+ global development indicators including income per capita, life expectancy, fertility rates, and CO2 emissions for 195 countries. Used in data engineering for development analytics, animated visualisation pipelines, and building SDG tracking systems in Python.