The Global Human Settlement Layer provides satellite-derived datasets on population density, built-up areas, urban extent, and settlement patterns from 1975 to present. Used in data engineering for urban analytics pipelines, population distribution modelling, smart city research, and geospatial data workflows in Python.
GHSL data is available as GeoTIFF rasters. Engineers use `rasterio` to open and process the grid data, `numpy` for raster arithmetic, and `geopandas` for aggregating pixel counts to administrative boundaries. Very large GeoTIFFs require tiled reading with windowed I/O.
GHSL built-up area and population grids are key inputs for AI urban analysis models. Train semantic segmentation models on GHSL labels to detect urban expansion in new satellite imagery, or use population density grids as features in AI models predicting disaster exposure and emergency response needs.
# pip install rasterio numpy geopandas
import rasterio, numpy as np
# Download GHSL population grid from https://ghsl.jrc.ec.europa.eu/download.php
with rasterio.open("GHS_POP_E2020_GLOBE_R2023A_4326_30ss_V1_0.tif") as src:
pop = src.read(1).astype(float)
pop[pop < 0] = np.nan # mask nodata
print(f"Total global population estimate: {np.nansum(pop):,.0f}")
print(f"Raster shape: {pop.shape}")Official dataset source
More datasets used by Python data engineers.
The WHO Global Health Observatory offers datasets on a wide range of health-related indicators, including disease prevalence, mortality rates, healthcare access and more.
The World Bank World Development Indicators provides 1,600+ time-series indicators covering poverty, health, education, infrastructure, and environment for 217 countries from 1960 onwards. Used in data engineering for global development dashboards, longitudinal analysis pipelines, and economic research systems in Python.
CDC WONDER provides access to US public health datasets including mortality records, natality data, cancer statistics, vaccination rates, and disease surveillance. Used in data engineering for public health analytics pipelines, epidemiological research systems, and building population health indicator dashboards in Python.