The UN Comtrade database provides detailed international trade statistics including import and export values, quantities, and trade flows for 200+ countries across 5,000+ commodity categories. Used in data engineering for trade analytics pipelines, supply chain intelligence, and economic research systems in Python.
The `comtradeapicall` Python library wraps the UN Comtrade API v3. Engineers query bilateral trade flows by reporter, partner, commodity (HS code), and year, paginating through results and storing in PostgreSQL for supply chain analysis and trade network visualization.
UN Comtrade data enables AI-powered trade analysis tools that answer 'What is the largest commodity imported by Germany from China?' with official UN statistics. Supply chain AI models use Comtrade bilateral flows to map global dependencies and identify concentration risks.
# pip install comtradeapicall pandas
import comtradeapicall as comtrade, pandas as pd
# US imports of semiconductors from Taiwan (HS code 854231)
df = comtrade.previewFinalData(
typeCode="C", freqCode="A", clCode="HS",
period="2022", reporterCode="842", cmdCode="854231",
flowCode="M", partnerCode="490", partner2Code=None,
customsCode=None, motCode=None, maxRecords=20, format_output="JSON"
)
print(df[["reporterDesc", "partnerDesc", "cmdCode", "primaryValue"]])Official dataset source
More datasets used by Python data engineers.
Data.gov hosts 300,000+ datasets from US federal agencies covering health, education, environment, agriculture, finance, and transportation. Used in data engineering for government analytics pipelines, public health research, geospatial analysis, and building civic data applications with Python.
Data.gov.uk provides datasets from UK central and local government covering crime, transport, planning, health, and environment. Used in data engineering for public sector analytics, policy research pipelines, geospatial visualisation, and building civic technology applications in Python.
Google's Open Images Dataset contains 9 million images annotated with object bounding boxes, segmentation masks, visual relationships, and image-level labels across 600 categories. Used in computer vision data engineering pipelines for model training, benchmark evaluation, and building image classification datasets in Python.