The Humanitarian Data Exchange hosts datasets on crises, conflicts, refugee movements, food insecurity, disease outbreaks, and natural disasters from NGOs and UN agencies. Used in data engineering for humanitarian intelligence pipelines, crisis response analytics, and building early warning dashboards in Python.
The `hdx-python-api` library provides clean access to the HDX CKAN API. Engineers use `Dataset.search_in_hdx()` to find datasets, then `Resource.download()` to fetch files. HDX datasets span CSV, Excel, GeoJSON, and Shapefile formats depending on the contributing organization.
HDX humanitarian data trains AI early warning systems for crisis detection and needs assessment. Build RAG knowledge bases indexed on HDX crisis reports so LLMs can provide informed analysis of specific humanitarian situations. AI models on HDX data predict food insecurity from conflict and displacement indicators.
# pip install hdx-python-api pandas
from hdx.api.configuration import Configuration
from hdx.data.dataset import Dataset
import pandas as pd
Configuration.create(hdx_site="prod", user_agent="my-app/1.0",
hdx_read_only=True)
datasets = Dataset.search_in_hdx("Syria food security", rows=3)
for ds in datasets:
print(ds["title"])
for resource in ds.get_resources()[:1]:
url, path = resource.download()
df = pd.read_csv(path)
print(df.head(3))Official dataset source
More datasets used by Python data engineers.
The IMF provides datasets on global economic indicators, including GDP growth, inflation rates, exchange rates, fiscal balances and international trade.
FiveThirtyEight publishes the datasets behind its data journalism articles covering US politics, sports analytics, economics, and culture. Available on GitHub as clean, analysis-ready CSV files, making them ideal for practising data loading, statistical analysis pipelines, and exploratory data workflows in Python.
The National Library of Medicine hosts PubMed, MedlinePlus, GenBank, and other biomedical databases covering clinical literature, genetic sequences, drug information, and medical terminology. Used in healthcare data engineering pipelines, clinical NLP workflows, and biomedical research ingestion in Python.