Database Migration

A Python dump of a database is not only a Backup, it is also the base of every database migration à la Lino.

When you upgrade to a newer version of the application you are running, or when you change certain parameters in your settings.py, then this might require changes in the database structure. This is called database migration.

  • Before upgrading or applying configuration changes, create a Backup.

  • After upgrading or applying configuration changes, restore your database from that backup. The restore.py script will automatically detect version changes and apply any necessary changes to your data.

For example, here is a upgrade with data migration of a Lino Voga site:

$ python manage.py dump2py 20130827
$ pip install -U lino_voga
$ python manage.py run 20130827/restore.py

It is of course recommended to stop any other processes which might access your database during the whole procedure.

Double Dump Test (DDT)

A Double Dump Test is a method to test for possible problems e.g. after a Database Migration: we make a first dump of the database to a Python fixture a.py, then we load that picture to the database, then we make a second dump to a fixture b.py. And finally we launch diff a.py b.py to verify that both pictures are identical.

Background:

When restore.py successfully terminated without any warnings and error messages, then there are good chances that your database has been successfully migrated.

But here is one more automated test that you may run when everything seems okay: a Double Dump Test (DDT).

This consists of the following steps:

  • make another dump of the freshly migrated database to a directory a.

  • restore this dump to the database

  • make a third dump b

  • Compare the files a and b: if there's no difference, then the double dump test succeeded!

In other words:

$ python manage.py dump2py a
$ python manage.py run a/restore.py
$ python manage.py dump2py a
$ diff a b

If there's no difference between the two dumps, then the test succeeded!

Designing data migrations for your application

Designing data migrations for your application is easy but not yet well documented.

The main trick that any restore.py file generated by dump2py contains the following line

settings.SITE.install_migrations(globals())

This means that the script itself will call the install_migrations method of your application before actually starting to load any database object. And it passes her globals() dict, which means that you can potentially change everything.

To see real-life example, look at the source code of lino_welfare.migrate and lino_welfare.old_migrate.

A magical before_dumpy_save attribute may contain custom code to apply inside the try...except block. If that code fails, the deserializer will simply defer the save operation and try it again.

Models that get special handling

  • ContentType objects aren't stored in a dump because they can always be recreated.

  • Site and Permission objects must be stored and must not be re-created

  • Session objects can get lost in a dump and are not stored.

Note about django-extensions

django-extensions has a command "dumpscript" which is comparable. Differences:

  • dumpy produces fixtures to be restored with loaddata, dumpscript produces a simple python script to be restored with runscript

  • the fixtures generated by dumpy are designed in order to make it possible to write automated data migrations.