Learn how to use Data Load Tool (dlt) to extract weather data from a REST API and load it into DuckDB. This beginner-friendly project demonstrates a simple yet effective data loading pattern perfect for API integration workflows.
This document explains the dlt example provided in dlt_example.py.
The example demonstrates how to:
dlt to extract data from a REST API.Before running the example, ensure you have installed the required packages. You can install dlt and duckdb using the following commands:
pip install dlt duckdb
import dlt import requests
In this section, we import dlt, the Data Load Tool, which helps with data extraction and loading, and requests, a common library for making HTTP requests to APIs.
@dlt.resource(write_disposition="append") def weather_data(): api_url = "https://api.open-meteo.com/v1/forecast?latitude=35.6895&longitude=139.6917&hourly=temperature_2m" response = requests.get(api_url) data = response.json() # Yield the hourly temperature data for timestamp, temperature in zip(data['hourly']['time'], data['hourly']['temperature_2m']): yield { "timestamp": timestamp, "temperature": temperature }
In this block, we define the weather_data function, which extracts hourly temperature data from the Open-Meteo API.
@dlt.resource(write_disposition="append"): This decorator tells dlt to handle this function as a resource. The write_disposition="append" ensures that new data is appended to the database instead of overwriting existing data.yield statement is used to provide records one at a time, which is essential for loading large datasets efficiently.pipeline = dlt.pipeline( pipeline_name="weather_pipeline", destination="duckdb", dataset_name="weather_data", credentials={"database": "weather_data.duckdb"} ) load_info = pipeline.run(weather_data) print(load_info)
This block sets up the data pipeline using dlt:
pipeline_name="weather_pipeline": The name of the pipeline.destination="duckdb": Specifies DuckDB as the destination database.dataset_name="weather_data": This is the name of the dataset inside the DuckDB database.credentials={"database": "weather_data.duckdb"}: Specifies that the data should be saved in a local DuckDB file called weather_data.duckdb.The pipeline is executed using pipeline.run(weather_data), which loads the data into DuckDB.
After the pipeline has run, you can query the data from the DuckDB database like this:
import duckdb con = duckdb.connect("weather_data.duckdb") df = con.execute("SELECT * FROM weather_data").fetchdf() print(df)
This connects to the weather_data.duckdb file and runs a query to fetch the data stored in the weather_data table. The result is printed as a Pandas DataFrame.
This example shows how to use dlt to automate the process of extracting, transforming, and loading data from a REST API into a DuckDB database. The pipeline can be extended with more complex data transformations or additional data sources.