How can I access Open Food Facts?

Open Food Facts is available as a downloadable dataset at https://world.openfoodfacts.org/data

What can I build with Open Food Facts?

Access nutritional information, ingredients, and additives for 3M+ food products. Build food scanner apps that identify products by barcode and show nutrition data. Train ML models to classify food products by health score or ingredient quality. Analyze food labeling patterns and additive usage across brands and countries

Open Food Facts

Dataset Downloads

About This Dataset

This a collaborative database of food products from around the world, containing information on ingredients, nutritional values, labels and food additives.

What You Can Build

1Access nutritional information, ingredients, and additives for 3M+ food products
2Build food scanner apps that identify products by barcode and show nutrition data
3Train ML models to classify food products by health score or ingredient quality
4Analyze food labeling patterns and additive usage across brands and countries

How Python Data Engineers Use Open Food Facts

Open Food Facts provides a complete database dump as CSV (downloadable from the website) and a REST API. Engineers load the CSV with `pandas.read_csv()` and handle the 180+ column schema with column selection. The `openfoodfacts` Python library provides API access for individual product lookups.

Open Food Facts for LLM Fine-Tuning and RAG Pipelines

Open Food Facts data trains AI nutrition analysis models that scan product barcodes and explain ingredient risks. RAG systems built on this dataset answer 'Does this product contain palm oil or high-fructose corn syrup?' A fine-tuned model on Nutri-Score ratings classifies food healthiness.

Python Example

# pip install openfoodfacts pandas
import openfoodfacts, pandas as pd

# Search products by category
results = openfoodfacts.products.get_by_category("cereals")
products = results["products"]
df = pd.DataFrame(products)[["product_name", "nutriscore_grade", "energy_100g"]]
print(df.dropna(subset=["nutriscore_grade"]).head(10))

Access Dataset

Official dataset source

Dataset Info

Category:Dataset Downloads

Type:Direct Download

Tags:

#csv #batch-processing #machine-learning

Related Datasets

More datasets used by Python data engineers.

UCI Machine Learning Repository

A curated repository of 600+ datasets covering classification, regression, clustering, and time-series tasks, widely used as machine learning benchmarks. Used in data engineering for building ML training pipelines, practising data preprocessing workflows, and loading tabular datasets into model training systems in Python.

GitHub Datasets

Thousands of publicly available datasets hosted on GitHub repositories covering social media, finance, healthcare, sports, and scientific domains. Accessible directly via the GitHub API or raw download URLs, making them ideal for practising version-controlled data ingestion and automated dataset pipelines in Python.

NOAA Climate Data Online

NOAA platform provides access to a vast collection of climate-related datasets, including historical weather data, climate observations, satellite imagery and climate model outputs.