Explore 1 tool and 76 datasets tagged with CSV for Python data engineering.
CSV (Comma-Separated Values) is the most widely used flat-file format for exchanging tabular data between systems. In Python data engineering, CSV files are a common source and sink for ETL pipelines, supported natively by pandas, the csv module, and most data warehouses. Tools and datasets tagged CSV are compatible with spreadsheet tools, SQL loaders, and streaming ingestion frameworks.
Delimited Data Preboarding
A delimited data preboarding framework that fills the gap between managed file transfer and the data lake. CsvPath provides a domain-specific language for validating, transforming, and routing CSV and other delimited files before ingestion.
A curated repository of 600+ datasets covering classification, regression, clustering, and time-seri
Kaggle hosts thousands of community-contributed datasets spanning economics, biology, computer visio
Google Dataset Search is a specialised search engine that indexes datasets stored across the web on
Thousands of publicly available datasets hosted on GitHub repositories covering social media, financ
Access 16,000+ development indicators from the World Bank covering GDP, poverty, health, education,
Data.gov hosts 300,000+ datasets from US federal agencies covering health, education, environment, a
Data.gov.uk provides datasets from UK central and local government covering crime, transport, planni
Explore relative search interest over time and across regions for any topic or keyword using Google
IMDb publishes regularly updated datasets covering movies, TV shows, episodes, cast, crew, ratings,
Access datasets on child well-being, education enrolment, nutrition, immunisation, child mortality,
The Federal Reserve Bank of St. Louis FRED database provides over 800,000 economic time series from
Regular XML snapshots of all Wikipedia articles, talk pages, and revision histories available for bu
The European Centre for Disease Prevention and Control publishes datasets on infectious disease surv
NOAA platform provides access to a vast collection of climate-related datasets, including historical
Access demographic, economic, social, and geographic datasets from the US Census Bureau including th
Access global health indicators, disease surveillance data, mortality statistics, and health system
The NCEI, part of NOAA, provides access to a wide range of environmental datasets, including climate
The FEC provides access to campaign finance data, including information on political contributions,
FiveThirtyEight publishes the datasets behind its data journalism articles covering US politics, spo
The National Renewable Energy Laboratory provides datasets on solar irradiance, wind resources, buil
New York City's open data portal provides 3,000+ datasets covering taxi trips, 311 complaints, crime
Google's Open Images Dataset contains 9 million images annotated with object bounding boxes, segment
The BEA provides economic data and statistics for the United States, including measures of GDP, nati
The Amazon Customer Reviews dataset on AWS Open Data contains 130+ million product reviews across 40
The US Federal Aviation Administration publishes datasets on aircraft registrations, pilot certifica
The Bureau of Justice Statistics publishes datasets on crime rates, incarceration, court proceedings
Eurobarometer surveys measure European public opinion on EU policies, political trust, social values
The Kaggle COVID-19 Dataset, curated by the Allen Institute for AI, aggregates a comprehensive colle
Google Cloud hosts petabyte-scale public datasets including genomics, satellite imagery, financial m
The ECB Statistical Data Warehouse provides access to a wide range of statistical data and reports o
The National Library of Medicine hosts PubMed, MedlinePlus, GenBank, and other biomedical databases
The GTD, maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism
Quandl (now Nasdaq Data Link) provides access to financial, economic, and alternative datasets inclu
The United Nations Development Programme publishes datasets on the Human Development Index, poverty
The World Economic Forum publishes datasets and reports covering global competitiveness, gender pari
The US Department of Agriculture publishes datasets on crop production, food prices, nutrition, soil
The EU Open Data Portal provides access to datasets from European institutions and agencies covering
Operated by the Institute for Health Metrics and Evaluation (IHME), provides access to a comprehensi
The Bureau of Transportation Statistics publishes datasets on airline on-time performance, freight s
The UN Comtrade database provides detailed international trade statistics including import and expor
Google Dataset Search indexes datasets published across the web on platforms like Kaggle, data.gov,
Eurostat, the statistical office of the European Union, offers a comprehensive database of statistic
The United States Department of Labor provides a wide range of datasets on labor market conditions,
Zillow Research offers datasets and reports on real estate market trends, home values, rental prices
This a collaborative database of food products from around the world, containing information on ingr
The Hugging Face Datasets library provides programmatic access to 50,000+ NLP, computer vision, and
CDC WONDER provides access to US public health datasets including mortality records, natality data,
The European Social Survey collects cross-national data on social attitudes, political engagement, v
A longitudinal cross-national survey measuring social, political, moral, and religious values across
The OECD provides macroeconomic, social, and environmental datasets for 38 member countries includin
The UNESCO Institute for Statistics publishes global datasets on education enrolment, literacy rates
Data from the ninth round of the European Social Survey covering attitudes on health, climate change
Provides datasets on global and regional CO2 emissions from fossil fuels and land use, carbon sinks
The World Inequality Database provides long-run historical data on income and wealth distribution, t
Global Forest Watch provides satellite-derived datasets on forest cover change, deforestation alerts
GLDAS provides datasets on land surface conditions, including soil moisture, temperature, precipitat
Natural Earth provides public domain map datasets at various scales, covering physical and cultural
Google Earth Engine hosts a vast collection of geospatial datasets covering topics such as land cove
Gapminder provides clean, long-run historical datasets on 500+ global development indicators includi
The Global Terrorism Index provides annual datasets from the Institute for Economics and Peace on te
GDELT offers datasets on global events, including news articles, social media posts, protests, confl
UN Comtrade provides detailed bilateral trade statistics including import, export, and re-export flo
The WHO Global Health Observatory offers datasets on a wide range of health-related indicators, incl
The IMF provides datasets on global economic indicators, including GDP growth, inflation rates, exch
The World Trade Organisation publishes datasets on global merchandise trade, commercial services, ta
UNICEF's data portal provides child-focused indicators covering mortality, nutrition, education, imm
The GBD study offers datasets on disease burden, mortality, morbidity, risk factors and health-relat
The Humanitarian Data Exchange hosts datasets on crises, conflicts, refugee movements, food insecuri
The World Bank World Development Indicators provides 1,600+ time-series indicators covering poverty,
WFP provides datasets on food security, hunger, malnutrition, food aid distribution, humanitarian as
The Global Findex Database offers datasets on financial inclusion indicators, access to banking, usa
UNCTAD provides datasets on trade, investment, development, globalization, economic indicators and o
The GUO offers datasets on urbanization, urban population growth, city demographics, slum population
The Global Human Settlement Layer provides satellite-derived datasets on population density, built-u
The Global Entrepreneurship Monitor publishes annual datasets on startup activity, entrepreneurial a
EOSDIS provides access to a wide range of Earth observation datasets, including satellite imagery, c