The GUO offers datasets on urbanization, urban population growth, city demographics, slum populations, urban infrastructure and sustainability indicators worldwide.
GUO data is available as Excel/CSV downloads from the UN World Urbanization Prospects portal. Engineers use `pandas.read_excel()` to load city population projections, then visualize urban growth trajectories with matplotlib or plotly for urban policy analysis.
Urban population data from GUO trains AI models for city growth prediction and urban planning optimization. Build RAG systems indexed on UN urbanization reports so LLMs can answer 'Which cities are projected to become megacities by 2050?' with official UN urban forecasts.
# pip install pandas openpyxl
import pandas as pd
# World Urbanization Prospects data from UN
# https://population.un.org/wup/Download/
df = pd.read_excel("WUP2018-F22-Cities_Over_300K_Annual.xls",
sheet_name="DATA", skiprows=16, engine="xlrd")
# 10 most populous cities projected for 2030
df_2030 = df[["Urban Agglomeration", "2030"]].dropna()
df_2030["2030"] = pd.to_numeric(df_2030["2030"], errors="coerce")
print(df_2030.nlargest(10, "2030"))Official dataset source
More datasets used by Python data engineers.
Eurostat, the statistical office of the European Union, offers a comprehensive database of statistical data covering various domains such as economy, population, employment, environment and social issues.
New York City's open data portal provides 3,000+ datasets covering taxi trips, 311 complaints, crime statistics, building permits, health inspections, and transit data. Used in urban data engineering pipelines for city analytics, transportation modelling, and building geospatial dashboards in Python.
The United Nations Development Programme publishes datasets on the Human Development Index, poverty rates, gender equality, and Sustainable Development Goal progress across 190+ countries. Used in data engineering for global development analytics, SDG monitoring pipelines, and country comparison dashboards in Python.