Gapminder provides clean, long-run historical datasets on 500+ global development indicators including income per capita, life expectancy, fertility rates, and CO2 emissions for 195 countries. Used in data engineering for development analytics, animated visualisation pipelines, and building SDG tracking systems in Python.
Gapminder data is downloadable as Excel/CSV by indicator. The `gapminder` Python package includes a pre-loaded version of the classic dataset. Engineers use `pandas.merge()` to combine multiple Gapminder indicators into a single analysis DataFrame.
Gapminder's clean, optimistic global development data is ideal for building educational AI that counters global ignorance. Build a RAG system indexed on Gapminder country profiles so an AI can answer 'How has child mortality improved in Ethiopia since 1990?' with verified development statistics.
# pip install gapminder pandas matplotlib
import pandas as pd
from gapminder import gapminder
df = gapminder.copy()
print(df.columns.tolist())
# Reproduce Rosling's 2007 chart data
chart_2007 = df[df["year"] == 2007].copy()
print(chart_2007.nlargest(5, "gdpPercap")[["country", "gdpPercap", "lifeExp", "pop"]])Official dataset source
More datasets used by Python data engineers.
The World Bank World Development Indicators provides 1,600+ time-series indicators covering poverty, health, education, infrastructure, and environment for 217 countries from 1960 onwards. Used in data engineering for global development dashboards, longitudinal analysis pipelines, and economic research systems in Python.
New York City's open data portal provides 3,000+ datasets covering taxi trips, 311 complaints, crime statistics, building permits, health inspections, and transit data. Used in urban data engineering pipelines for city analytics, transportation modelling, and building geospatial dashboards in Python.
The WHO Global Health Observatory offers datasets on a wide range of health-related indicators, including disease prevalence, mortality rates, healthcare access and more.