The GTD, maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism (START) at the University of Maryland, provides detailed information on terrorist attacks worldwide.
The GTD is downloadable as a CSV file from the START Center. Engineers use `pandas.read_csv()` with encoding handling for the large file (~230,000 incidents). `geopandas` spatial joins add administrative boundaries, enabling choropleth mapping of regional terrorism intensity.
The Global Terrorism Database trains AI risk assessment models that predict attack likelihood from regional indicators. RAG systems built on GTD data enable AI security analysts to query 'What attack tactics has ISIS used in Western Europe?' with verified historical incident data.
# pip install pandas
import pandas as pd
# Download globalterrorismdb_0522dist.xlsx from START center first
df = pd.read_excel("globalterrorismdb_0522dist.xlsx", engine="openpyxl",
usecols=["iyear", "country_txt", "attacktype1_txt", "nkill"])
annual = df.groupby("iyear").agg(incidents=("iyear", "count"), fatalities=("nkill", "sum"))
print(annual.tail(10))Official dataset source
More datasets used by Python data engineers.
Regular XML snapshots of all Wikipedia articles, talk pages, and revision histories available for bulk download. Used in data engineering for building large-scale NLP corpora, knowledge graph extraction, full-text search indices, and training language models with Python processing tools like WikiExtractor.
A curated repository of 600+ datasets covering classification, regression, clustering, and time-series tasks, widely used as machine learning benchmarks. Used in data engineering for building ML training pipelines, practising data preprocessing workflows, and loading tabular datasets into model training systems in Python.
Data.gov hosts 300,000+ datasets from US federal agencies covering health, education, environment, agriculture, finance, and transportation. Used in data engineering for government analytics pipelines, public health research, geospatial analysis, and building civic data applications with Python.