The Global Terrorism Index provides annual datasets from the Institute for Economics and Peace on terrorism incidents, fatalities, injuries, hostages, and country-level risk scores. Used in data engineering for geopolitical risk analytics pipelines, security intelligence dashboards, and threat trend analysis in Python.
GTI data is downloadable as Excel files from the IEP website. Engineers load with `pandas.read_excel()`, handle multi-row headers, and reshape to long format for time-series analysis. Country ISO codes enable joins with World Bank and political science datasets.
GTI country risk scores are features for AI security risk models that predict regional instability. RAG systems indexed on GTI reports allow LLMs to answer 'Which countries have seen the biggest increase in terrorism impact in the last five years?' with IEP-sourced security analytics.
# pip install pandas openpyxl
import pandas as pd
# Download GTI Excel from https://www.visionofhumanity.org/maps/#/
df = pd.read_excel("GTI-2024-interactive-dataset.xlsx",
sheet_name="GTI Scores", skiprows=2, engine="openpyxl")
df.columns = ["rank", "country", "score_2024", "score_2023", "change"]
df["change"] = pd.to_numeric(df["change"], errors="coerce")
print(df.nlargest(10, "score_2024")[["country", "score_2024"]])Official dataset source
More datasets used by Python data engineers.
Quandl (now Nasdaq Data Link) provides access to financial, economic, and alternative datasets including stock prices, futures, commodities, and sentiment data. Used in quantitative data engineering pipelines for financial modelling, backtesting, and building investment analytics systems with the Quandl Python library.
CDC WONDER provides access to US public health datasets including mortality records, natality data, cancer statistics, vaccination rates, and disease surveillance. Used in data engineering for public health analytics pipelines, epidemiological research systems, and building population health indicator dashboards in Python.
Kaggle hosts thousands of community-contributed datasets spanning economics, biology, computer vision, NLP, sports, and social science. Used in data engineering for sourcing training data, benchmarking pipelines, practising large-scale data loading, and building end-to-end ML workflows in Python.