The World Trade Organisation publishes datasets on global merchandise trade, commercial services, tariff schedules, trade agreements, and dispute settlement records. Used in data engineering for international trade analytics pipelines, tariff analysis systems, and building global commerce monitoring dashboards in Python.
WTO provides bulk data downloads and a Stats API (stats.wto.org). Engineers use `requests` with indicator and reporter parameters to retrieve trade flows, tariff rates, and non-tariff measures. Data is available as CSV or JSON and loads into pandas for trade policy analysis.
WTO trade data enables AI policy analysis tools that track global trade rule compliance and tariff impacts. RAG systems indexed on WTO dispute settlement records allow LLMs to answer 'What trade disputes has the US filed against China since 2018?' with official WTO adjudication data.
# pip install requests pandas
import requests, pandas as pd
resp = requests.get(
"https://stats.wto.org/api/assets/GetDynData",
params={"datasource": "MerchandiseTrade",
"reporterCode": "USA",
"partnerCode": "CHN",
"year": "2023",
"format": "json"}
)
df = pd.DataFrame(resp.json()["Dataset"])
print(df[["Indicator", "Value", "Unit"]].head(10))Official dataset source
More datasets used by Python data engineers.
Quandl (now Nasdaq Data Link) provides access to financial, economic, and alternative datasets including stock prices, futures, commodities, and sentiment data. Used in quantitative data engineering pipelines for financial modelling, backtesting, and building investment analytics systems with the Quandl Python library.
CDC WONDER provides access to US public health datasets including mortality records, natality data, cancer statistics, vaccination rates, and disease surveillance. Used in data engineering for public health analytics pipelines, epidemiological research systems, and building population health indicator dashboards in Python.
Kaggle hosts thousands of community-contributed datasets spanning economics, biology, computer vision, NLP, sports, and social science. Used in data engineering for sourcing training data, benchmarking pipelines, practising large-scale data loading, and building end-to-end ML workflows in Python.