An open-source music encyclopaedia API providing structured data on artists, albums, recordings, labels, and relationships. Used in data engineering for building music catalogues, constructing artist graph datasets, enriching streaming data with metadata, and practising complex JSON ingestion in Python.
The `musicbrainzngs` Python library implements the MusicBrainz XML API with proper rate limiting (1 req/sec per their policy). Engineers look up recordings by ISRC, artists by MBID, and releases by catalog number to build master reference datasets for music applications.
MusicBrainz provides the authoritative knowledge graph for music AI systems. Use it to build a RAG pipeline that answers 'Who produced Daft Punk's Random Access Memories?' with verified metadata, or train entity-linking models that map informal artist mentions to canonical MusicBrainz IDs.
# pip install musicbrainzngs
import musicbrainzngs
musicbrainzngs.set_useragent("my-app", "1.0", "my@email.com")
result = musicbrainzngs.search_artists(artist="Daft Punk", limit=3)
for artist in result["artist-list"]:
print(artist["name"], "-", artist.get("country", "unknown"))Official dataset source
More datasets used by Python data engineers.
Retrieve real-time and historical air quality measurements including PM2.5, PM10, ozone, NO2, and CO from monitoring stations worldwide. Used in environmental data engineering pipelines for pollution trend analysis, public health analytics, geospatial mapping of air quality, and time-series ingestion in Python.
Access open-source global map data including roads, buildings, points of interest, land use, and administrative boundaries from OpenStreetMap. Used in geospatial data engineering pipelines for routing analysis, map enrichment, address geocoding, and building location-aware datasets with Python using the OSMnx library.
It provides REST access to FoodData Central (FDC). It is intended primarily to assist application developers wishing to incorporate nutrient data into their applications or websites.