I have been running PostgreSQL migrations by hand for years and it was fine, until it wasn't. One missed ALTER TABLE on production and suddenly your app is throwing 500s at 2am. Here is how I do it now, properly.
Why Migrations Matter
If you are still running SQL files manually against your database, stop. Migrations give you version control for your schema. You can roll back, you can see what changed, and you can reproduce your database state on any machine. It is not optional anymore, it is basic hygiene.
The Tool: pgmigrate
There are a bunch of migration tools out there. Flyway, Alembic, Django migrations, whatever your framework ships with. I tend to reach for something lightweight. Lately I have been using plain SQL files with a simple runner script. No ORM, no framework lock-in.
Here is the directory structure I use:
migrations/
001_create_users.sql
002_add_email_column.sql
003_create_posts.sql
schema_migrations.sqlEach file runs in order. The schema_migrations table keeps track of what has been applied.
The Migration Tracker Table
First, you need a table to track which migrations ran. Here is the SQL:
CREATE TABLE IF NOT EXISTS schema_migrations (
version INTEGER PRIMARY KEY,
name TEXT NOT NULL,
applied_at TIMESTAMP DEFAULT NOW()
);The Runner Script
Here is the Python script that runs pending migrations:
import os
import psycopg2
import re
DB_URL = os.environ.get('DATABASE_URL', 'postgresql://localhost/mydb')
MIGRATIONS_DIR = 'migrations'
def get_applied_versions(cur):
cur.execute('SELECT version FROM schema_migrations')
return {row[0] for row in cur.fetchall()}
def get_pending_migrations(applied):
files = sorted(os.listdir(MIGRATIONS_DIR))
pending = []
for f in files:
if not f.endswith('.sql'):
continue
match = re.match(r'^(\d+)_', f)
if match and int(match.group(1)) not in applied:
pending.append(f)
return pending
def run_migration(cur, filename):
version = int(re.match(r'^(\d+)', filename).group(1))
filepath = os.path.join(MIGRATIONS_DIR, filename)
with open(filepath) as f:
sql = f.read()
cur.execute(sql)
cur.execute(
'INSERT INTO schema_migrations (version, name) VALUES (%s, %s)',
(version, filename)
)
def migrate():
conn = psycopg2.connect(DB_URL)
conn.autocommit = False
cur = conn.cursor()
applied = get_applied_versions(cur)
pending = get_pending_migrations(applied)
if not pending:
print('Nothing to migrate.')
return
for f in pending:
print(f'Running {f}...')
try:
run_migration(cur, f)
conn.commit()
print(f' Done.')
except Exception as e:
conn.rollback()
print(f' FAILED: {e}')
break
cur.close()
conn.close()
if __name__ == '__main__':
migrate()Key things here: each migration runs in its own transaction. If one fails, it rolls back and stops. You do not want half-applied migrations, that is a nightmare to clean up.
Writing a Migration
Example migration file ( 004_add_created_at.sql ):
-- Add created_at timestamp to posts table
ALTER TABLE posts ADD COLUMN created_at TIMESTAMP DEFAULT NOW();
-- Backfill existing rows
UPDATE posts SET created_at = NOW() WHERE created_at IS NULL;A few rules I follow when writing migrations:
Never edit a migration that has already been applied. Create a new one.Always include a default value when adding a column to a table with existing data.Test migrations on a copy of production data before running them for real ( I use pg_dump and restore to a local db ).Avoid long-running ALTER TABLEs on big tables during peak hours. PostgreSQL locks the table for most ALTER operations.Rolling Back
This script does not do automatic rollbacks. I intentionally skipped that. In my experience, down migrations are a trap. You write them once, they sit there untested, and when you actually need them they break because your data has diverged. Instead, I write a reverse migration as a new forward migration ( 005_revert_add_created_at.sql ). It is more honest about what is happening.
Running in Production
In CI/CD I add a step that runs the migration script before deploying the new code:
# In your deploy script or CI pipeline
python3 migrate.py
# Then deploy the app
systemctl restart myappSchema migrations first, code deploy after. If the migration fails, the deploy stops and you investigate. No half-states.
It is not fancy but it works and I can reason about it at 2am when something goes wrong. That is the real test of any infra tool.