Alembic Migration Example Explanation

This document explains the Alembic migration example we've set up.

Overview

The example demonstrates how to:

Set up an Alembic environment for a SQLite database
Define models using SQLAlchemy
Create and run database migrations
Add new columns and tables to an existing database schema

Key features demonstrated:

Declarative model definition
Relationship between tables
Adding columns to existing tables
Creating new tables
Providing both upgrade and downgrade paths

This setup allows for:

Tracking various data sources
Storing raw data with timestamps
Adding a data quality metric later
Creating a separate table for processed data

These scenarios reflect real-world data engineering tasks like expanding data models, improving data quality tracking, and separating raw and processed data.

Project Structure

alembic_example/
├── alembic/
│   ├── versions/
│   │   ├── <revision_id>_add_data_quality_column.py
│   │   └── <revision_id>_create_processed_data_table.py
│   └── env.py
├── alembic.ini
└── models.py

Code Explanation

Model Definition (models.py)

from sqlalchemy import Column, Integer, String, DateTime, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship

Base = declarative_base()

class DataSource(Base):
    __tablename__ = 'data_sources'
    id = Column(Integer, primary_key=True)
    name = Column(String(100), nullable=False)
    url = Column(String(500), nullable=False)
    type = Column(String(50), nullable=False)

class RawData(Base):
    __tablename__ = 'raw_data'
    id = Column(Integer, primary_key=True)
    source_id = Column(Integer, ForeignKey('data_sources.id'))
    timestamp = Column(DateTime, nullable=False)
    content = Column(String, nullable=False)

    source = relationship("DataSource")

This file defines two models: DataSource and RawData, representing a basic data ingestion system.

Migration Scripts

Two migration scripts are created:

Add data quality column:

def upgrade():
    op.add_column('raw_data', sa.Column('data_quality', sa.Float))

def downgrade():
    op.drop_column('raw_data', 'data_quality')

Create processed data table:

def upgrade():
    op.create_table(
        'processed_data',
        sa.Column('id', sa.Integer, primary_key=True),
        sa.Column('raw_data_id', sa.Integer, sa.ForeignKey('raw_data.id')),
        sa.Column('processed_content', sa.String),
        sa.Column('processing_timestamp', sa.DateTime)
    )

def downgrade():
    op.drop_table('processed_data')

Running the Example

To run this example:

Ensure you have Alembic and SQLAlchemy installed:
```
pip install alembic sqlalchemy
```
Initialize the Alembic environment (if not already done):
```
alembic init alembic
```
Edit alembic.ini to set the correct database URL:
```
sqlalchemy.url = sqlite:///./data.db
```
Edit alembic/env.py to import your models and set the target metadata:
```
from models import Base
target_metadata = Base.metadata
```

Create a new migration:

alembic revision --autogenerate -m "Initial migration"

Apply the migration:
```
alembic upgrade head
```

Create and apply additional migrations:

alembic revision -m "add data quality column"
alembic revision -m "create processed data table"
alembic upgrade head

To revert migrations:

alembic downgrade -1  # Revert one step
alembic downgrade base  # Revert all migrations

Key points about configuration in alembic.ini:

The script_location is set to alembic, which is the directory where your migration scripts will be stored.
The sqlalchemy.url is set to sqlite:///./data.db. This specifies a SQLite database file named data.db in the current directory. Modify this line if you're using a different database.
The file_template determines how migration file names are generated. The current setting will create files named like 20230615_123456_add_new_column.py.
The logging configuration at the bottom of the file sets up basic logging for Alembic and SQLAlchemy.

This example showcases how Alembic can manage database schema changes over time, allowing for version control of your database structure and smooth collaboration in team environments.

Alembic Migration Example Explanation

This document explains the Alembic migration example we've set up.

Overview

The example demonstrates how to:

Set up an Alembic environment for a SQLite database
Define models using SQLAlchemy
Create and run database migrations
Add new columns and tables to an existing database schema

Key features demonstrated:

Declarative model definition
Relationship between tables
Adding columns to existing tables
Creating new tables
Providing both upgrade and downgrade paths

This setup allows for:

Tracking various data sources
Storing raw data with timestamps
Adding a data quality metric later
Creating a separate table for processed data

These scenarios reflect real-world data engineering tasks like expanding data models, improving data quality tracking, and separating raw and processed data.

Project Structure

alembic_example/
├── alembic/
│   ├── versions/
│   │   ├── <revision_id>_add_data_quality_column.py
│   │   └── <revision_id>_create_processed_data_table.py
│   └── env.py
├── alembic.ini
└── models.py

Code Explanation

Model Definition (models.py)

from sqlalchemy import Column, Integer, String, DateTime, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship

Base = declarative_base()

class DataSource(Base):
    __tablename__ = 'data_sources'
    id = Column(Integer, primary_key=True)
    name = Column(String(100), nullable=False)
    url = Column(String(500), nullable=False)
    type = Column(String(50), nullable=False)

class RawData(Base):
    __tablename__ = 'raw_data'
    id = Column(Integer, primary_key=True)
    source_id = Column(Integer, ForeignKey('data_sources.id'))
    timestamp = Column(DateTime, nullable=False)
    content = Column(String, nullable=False)

    source = relationship("DataSource")

This file defines two models: DataSource and RawData, representing a basic data ingestion system.

Migration Scripts

Two migration scripts are created:

Add data quality column:

def upgrade():
    op.add_column('raw_data', sa.Column('data_quality', sa.Float))

def downgrade():
    op.drop_column('raw_data', 'data_quality')

Create processed data table:

def upgrade():
    op.create_table(
        'processed_data',
        sa.Column('id', sa.Integer, primary_key=True),
        sa.Column('raw_data_id', sa.Integer, sa.ForeignKey('raw_data.id')),
        sa.Column('processed_content', sa.String),
        sa.Column('processing_timestamp', sa.DateTime)
    )

def downgrade():
    op.drop_table('processed_data')

Running the Example

To run this example:

Ensure you have Alembic and SQLAlchemy installed:
```
pip install alembic sqlalchemy
```
Initialize the Alembic environment (if not already done):
```
alembic init alembic
```
Edit alembic.ini to set the correct database URL:
```
sqlalchemy.url = sqlite:///./data.db
```
Edit alembic/env.py to import your models and set the target metadata:
```
from models import Base
target_metadata = Base.metadata
```

Create a new migration:

alembic revision --autogenerate -m "Initial migration"

Apply the migration:
```
alembic upgrade head
```

Create and apply additional migrations:

alembic revision -m "add data quality column"
alembic revision -m "create processed data table"
alembic upgrade head

To revert migrations:

alembic downgrade -1  # Revert one step
alembic downgrade base  # Revert all migrations

Key points about configuration in alembic.ini:

The script_location is set to alembic, which is the directory where your migration scripts will be stored.
The sqlalchemy.url is set to sqlite:///./data.db. This specifies a SQLite database file named data.db in the current directory. Modify this line if you're using a different database.
The file_template determines how migration file names are generated. The current setting will create files named like 20230615_123456_add_new_column.py.
The logging configuration at the bottom of the file sets up basic logging for Alembic and SQLAlchemy.

This example showcases how Alembic can manage database schema changes over time, allowing for version control of your database structure and smooth collaboration in team environments.

Database Schema Migrations with Alembic

Prerequisites

What You'll Learn

Alembic Migration Example Explanation

Overview

Project Structure

Code Explanation

Model Definition (models.py)

Migration Scripts

Running the Example

Category

Tools Used

Database Schema Migrations with Alembic

Prerequisites

What You'll Learn

Alembic Migration Example Explanation

Overview

Project Structure

Code Explanation

Model Definition (models.py)

Migration Scripts

Running the Example

Category

Tools Used