Browse 26 datasets tagged with Machine Learning for Python data engineering.
Retrieve data from Reddit, including posts, comments, user information and subreddit details.
Access various natural language processing models and tools provided by OpenAI.
It provides access to Wolfram Alpha's computational knowledge engine, allowing developers to obtain concise answers to factual questions and queries.
Generate random user profiles with realistic attributes such as names, addresses, phone numbers and email addresses.
An open-source database that collects information about music artists, releases, and tracks.
Explore a comprehensive database of breweries, including details like beer types, addresses, and contact information.
Retrieve real-time and historical air quality data from locations around the world.
A collection of databases, domain theories and data generators widely used by the machine learning community.
Many datasets are available on GitHub, covering diverse topics such as social media, finance and healthcare.
Various governments and organizations maintain open data portals, offering access to government statistics, geospatial data and more.
Various governments and organizations maintain open data portals, offering access to government statistics, geospatial data and more.
Wikipedia Dumps provide comprehensive snapshots of Wikipedia articles and other content in XML format.
NOAA platform provides access to a vast collection of climate-related datasets, including historical weather data, climate observations, satellite imagery and climate model outputs.
The FEC provides access to campaign finance data, including information on political contributions, campaign expenditures, fundraising activities and financial disclosures filed by political candidates, parties and committees in the United States.
Hosted on the AWS Open Data Registry, it contains millions of product reviews submitted by Amazon customers.
The FAA provides various datasets related to aviation, air traffic, airports and safety regulations in the United States.
The Kaggle COVID-19 Dataset, curated by the Allen Institute for AI, aggregates a comprehensive collection of research articles, datasets and other resources related to the COVID-19 pandemic.
Google Cloud hosts a variety of public datasets covering domains such as genomics, geospatial data, financial markets and more.
The GTD, maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism (START) at the University of Maryland, provides detailed information on terrorist attacks worldwide.
The UNDP offers a variety of datasets related to global development indicators, human development indices and sustainable development goals (SDGs).
Eurostat, the statistical office of the European Union, offers a comprehensive database of statistical data covering various domains such as economy, population, employment, environment and social issues.
This a collaborative database of food products from around the world, containing information on ingredients, nutritional values, labels and food additives.
A library that provides access to a wide range of datasets for natural language processing (NLP) tasks.
Natural Earth provides public domain map datasets at various scales, covering physical and cultural features such as coastlines, rivers, cities and political boundaries.
WFP provides datasets on food security, hunger, malnutrition, food aid distribution, humanitarian assistance and other aspects of global food insecurity.
The GUO offers datasets on urbanization, urban population growth, city demographics, slum populations, urban infrastructure and sustainability indicators worldwide.