Learn how to use Alembic, the standalone database migration tool for SQLAlchemy, to manage schema changes over time. This project demonstrates creating initial migrations, adding columns, creating new tables, and rolling back changes - essential skills for maintaining database schemas in production.
This document explains the Alembic migration example we've set up.
The example demonstrates how to:
Key features demonstrated:
This setup allows for:
These scenarios reflect real-world data engineering tasks like expanding data models, improving data quality tracking, and separating raw and processed data.
alembic_example/
├── alembic/
│ ├── versions/
│ │ ├── <revision_id>_add_data_quality_column.py
│ │ └── <revision_id>_create_processed_data_table.py
│ └── env.py
├── alembic.ini
└── models.py
from sqlalchemy import Column, Integer, String, DateTime, ForeignKey from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.orm import relationship Base = declarative_base() class DataSource(Base): __tablename__ = 'data_sources' id = Column(Integer, primary_key=True) name = Column(String(100), nullable=False) url = Column(String(500), nullable=False) type = Column(String(50), nullable=False) class RawData(Base): __tablename__ = 'raw_data' id = Column(Integer, primary_key=True) source_id = Column(Integer, ForeignKey('data_sources.id')) timestamp = Column(DateTime, nullable=False) content = Column(String, nullable=False) source = relationship("DataSource")
This file defines two models: DataSource and RawData, representing a basic data ingestion system.
Two migration scripts are created:
def upgrade(): op.add_column('raw_data', sa.Column('data_quality', sa.Float)) def downgrade(): op.drop_column('raw_data', 'data_quality')
def upgrade(): op.create_table( 'processed_data', sa.Column('id', sa.Integer, primary_key=True), sa.Column('raw_data_id', sa.Integer, sa.ForeignKey('raw_data.id')), sa.Column('processed_content', sa.String), sa.Column('processing_timestamp', sa.DateTime) ) def downgrade(): op.drop_table('processed_data')
To run this example:
Ensure you have Alembic and SQLAlchemy installed:
pip install alembic sqlalchemy
Initialize the Alembic environment (if not already done):
alembic init alembic
Edit alembic.ini to set the correct database URL:
sqlalchemy.url = sqlite:///./data.db
Edit alembic/env.py to import your models and set the target metadata:
from models import Base target_metadata = Base.metadata
Create a new migration:
alembic revision --autogenerate -m "Initial migration"
Apply the migration:
alembic upgrade head
Create and apply additional migrations:
alembic revision -m "add data quality column"
alembic revision -m "create processed data table"
alembic upgrade head
To revert migrations:
alembic downgrade -1 # Revert one step
alembic downgrade base # Revert all migrations
Key points about configuration in alembic.ini:
This example showcases how Alembic can manage database schema changes over time, allowing for version control of your database structure and smooth collaboration in team environments.