Writing Migrations

What is a Migration?

A migration is a Python class that should be placed in a file called data_migration_spec.py in one of your app-directories. django-data-migrations searches in each app, included in INSTALLED_APPS, for this file and imports all from it automatically.

Your migration normally specifies the following things:

  • A database connection to your legacy data (whereever this is)

  • The model class, the migration should create instances for

  • A corresponding SQL-Query, which maps the old DB-schema to the new Django-model-schema

    • You can specify what should be done with special columns, returned by the query (Many2Many-, ForeignKey-, One2One-Relations). With minimal configuration, these things can be migrated automatically.
  • Dependencies to other models can be specified. This is used, to determine the order each migration can be applied. e.g. If a migration specifies a model as dependency, his migration will be executed before our migration will be processed.

  • You can implement different hooks, where you normally manipulate the data returned by the query or do some things which are not possible by SQL itself.

  • You can specify, if your migration should look for new instances on a second run. This is not the default case.

A complete Migration example

To give you an overview, how a common migration looks, the following listing shows a migration for a Post model. This is an excerpt from a data_migration_spec.py which can be found in a testing app, which is used by django-data-migration itself.

The complete app can be found here ...

class PostMigration(BaseMigration):
    query = """
    SELECT id,
        Title as title,
        Body as body,
        Posted as posted,
        Author as author,
        (
            SELECT
                GROUP_CONCAT(id)
            FROM comments c
            WHERE c.Post = p.id
        ) as comments
    FROM posts p;
    """
    model = Post
    depends_on = [ Author, Comment ]
    column_description = {
        'author': is_a(Author, search_attr="id", fk=True, prefetch=False),
        'comments': is_a(Comment, search_attr="id", m2m=True,
                         delimiter=",", prefetch=False)
    }

    @classmethod
    def hook_after_save(self, instance, row):
        # because of the auto_now_add flag, we have to set it hard to this value
        instance.posted = row['posted']
        instance.save()

As you can see, PostMigration inherits from a class called BaseMigration. This is one of the classes which is listed here Setup Database Connection.

Migration details

Setup Database Connection

django-data-migration should support as many databases as possible, so the connection part is not implemented directly for each database. You have to override the open_db_connection classmethod in your migration.

Tip

The connection handling should be implemented once in a BaseMigration where all other Migrations inherit from.

Important

django-data-migration requires that the database returns a DictCursor, where each row is a dict with column names as keys and the row as corresponding values.

SQLite

The following code implements an example database connection for SQLite:

import sqlite3

class BaseMigration(Migration):

    @classmethod
    def open_db_connection(self):
        conn = sqlite3.connect(......)

        def dict_factory(cursor, row):
            d = {}
            for idx, col in enumerate(cursor.description):
                d[col[0]] = row[idx]
            return d

        conn.row_factory = dict_factory
        return conn

MySQL

You have to install the corresponding MySQL-Python-driver by executing:

pip install MySQL-python

The following code implements an example database connection for MySQL.

import MySQLdb

class BaseMigration(Migration):

    @classmethod
    def open_db_connection(self):
        return MySQLdb.connect(......,
            cursorclass=MySQLdb.cursors.DictCursor
        )

PostgreSQL

You have to install the corresponding PostgreSQL-Python-driver by executing:

pip install psycopg2

Important

a version of psycopg >= 2.5 is required as it allows the cursor_factory to be specified through connect() instead of get_cursor().

The following code implements an example database connection for PostgreSQL.

import psycopg2
import psycopg2.extras

class BaseMigration(Migration):

    @classmethod
    def open_db_connection(self):
        return psycopg2.connect(......,
            cursor_factory=psycopg2.extras.RealDictCursor
        )

MS-SQL

@aidanlister contributed a sample DB connection for MS-SQL using pyodbc, which has to be installed first:

pip install pyodbc

The following code implements an example database connection for MS-SQL.

import pyodbc

class ConnectionWrapper(object):
    def __init__(self, cnxn):
        self.cnxn = cnxn

    def __getattr__(self, attr):
        return getattr(self.cnxn, attr)

    def cursor(self):
        return CursorWrapper(self.cnxn.cursor())


class CursorWrapper(object):
    def __init__(self, cursor):
        self.cursor = cursor

    def __getattr__(self, attr):
        return getattr(self.cursor, attr)

    def fetchone(self):
        row = self.cursor.fetchone()
        if not row:
            return None
        return dict((t[0], value) for t, value in zip(self.cursor.description, row))

    def fetchall(self):
        rows = self.cursor.fetchall()
        if not rows:
            return None

        dictrows = []
        for row in rows:
            row = dict((t[0], value) for t, value in zip(self.cursor.description, row))
            dictrows.append(row)
        return dictrows


class BaseMigration(Migration):
    @classmethod
    def open_db_connection(self):
        dsn = "DRIVER={SQL Server Native Client 11.0};SERVER=X;DATABASE=X;UID=X;PWD=X"

        cnxn = pyodbc.connect(dsn)
        wrapped_connection = ConnectionWrapper(cnxn)
        return wrapped_connection

What can be configured in every migration

In your migration classes you have several configuration options, which are listed below with a short description. For an in-depth explanation you can consult the paragraphs below.

Writing effective Migration-queries

Important

TODO

Define dependencies

Important

TODO

Describe special columns

Your query can include special columns, that are represented as special Django-relations (ForeignKey-, Many2Many- or One2One-Relations). Or you can exclude specific columns from automatic processing. You will normally define these settings with an invocation of the is_a-function, which does some tests and returns the required settings. This will then be used by django-data-migration in different places.

Some examples for is_a can be found here: A complete Migration example.

Using Migration Hooks

data_migration.migration.Migration defines a number of different hook-functions which will be called at different places allowing you to customize the migration work at different levels.

Error-Handling

In case of an exception when creating the instances, a default error handler will be called, to print the current row to stderr and than reraise the exception.

You can override this hook in your migration if it requires special handling of errors. When this method returns without an exception, the next row from the query will be processed.

Hook-Flowchart

The following graphic shows each Hook-method and when it is called in contrast to the model handling which is done by django-data-migration.

+------------------+
|hook_before_all() |
+--------------+---+
               |
    +-----+    |
    |     |    |
    |  +--v----v--------------------+
    |  |hook_before_transformation()|
    |  +-------+--------------------+
    |          |
    |      +---v--------------------+
    |      |instance = model(**data)|
    |      +---+--------------------+
    |          |
    |  +-------v----------+
    |  |hook_before_save()|
    |  +-------+----------+
    |          |
    |      +---v-----------+
    |      |instance.save()|
    |      +---+-----------+
    |          |
    |  +-------v---------+
    |  |hook_after_save()|
    |  +-------+---------+
    |          |
    +-------+--+
            |
            |
+-----------v-----+
|hook_after_all() |
+-----------------+

Implement updateable Migrations

Important

TODO