FiveThirtyEight publishes the datasets behind its data journalism articles covering US politics, sports analytics, economics, and culture. Available on GitHub as clean, analysis-ready CSV files, making them ideal for practising data loading, statistical analysis pipelines, and exploratory data workflows in Python.
FiveThirtyEight datasets live on GitHub as CSV files. Engineers use `pandas.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/...')` to load them directly. The data is already cleaned and analysis-ready, making it excellent for teaching pandas and matplotlib.
FiveThirtyEight datasets are clean, journalist-vetted training data for AI models in polling analysis, sports prediction, and political science. The election polling averages are particularly useful for time-series forecasting models, while the sports data trains AI that predict game outcomes.
# pip install pandas
import pandas as pd
# FiveThirtyEight datasets are hosted on GitHub as CSV files
base = "https://raw.githubusercontent.com/fivethirtyeight/data/master"
df = pd.read_csv(f"{base}/births/US_births_2000-2014_SSA.csv")
print(df.groupby("year")["births"].sum())Official dataset source
More datasets used by Python data engineers.
The IMF provides datasets on global economic indicators, including GDP growth, inflation rates, exchange rates, fiscal balances and international trade.
Access 16,000+ development indicators from the World Bank covering GDP, poverty, health, education, infrastructure, and environment for 200+ countries. Used in data engineering for building global development dashboards, time-series analysis pipelines, and cross-country economic comparison systems in Python.
The BEA provides economic data and statistics for the United States, including measures of GDP, national income, consumer spending and trade balances.