Safer (Soft) Deletion in Django

The Problem

The default behaviour in Django is to cascade deletes, such that when a record is deleted any records that are associated with this record (via a foreign key) are also deleted.

This preserves referential integrity within the database which is what we want. Where this can be problematic though is that one careless or mistaken deletion in the admin can cause any number of related records to also be deleted, with no neat way to undo the operation (at least with Django out of the box).

The idea behind the Django admin site is that you let trusted users manage the data through it, in that they are trusted to be careful and so don’t need so much protecting from their own actions. When you are constructing the public or user-facing part of the site then you will naturally become very careful/paranoid, so as to protect the system from tampering and also to provide a safe environment where the user can interact with the system without fear of doing something terrible to their data and needing to contact support.

Such extra care comes at a price though, with such development taking significantly longer than it would take to set up and customise the admin site for staff to use.

Desired Outcome

While it is possible to extract the missing data from backups, it would be far better if recovery from accidental deletions didn’t require programmer/sysadmin intervention and was instead baked into the system.

Django doesn’t support this kind of recovery out of the box so we will use a combination of available libraries to achieve this.

In particular:

  • It should be possible for staff to recover records that they have deleted via the admin interface.
  • Any related records present at time of deletion (and so deleted, unless the on_delete was changed) should also be restored, with all objects linked as they were before.

Possible Solutions

The cascading delete behaviour of Django can be changed to one of:

  • Set the referencing field to NULL
  • Protect the related records and raise an exception
  • Do nothing (if for example you will set up your RDBMS to maintain referential integrity)
  • Set to the field’s default value
  • Set the field to a particular value

We can switch to the SET_NULL policy and avoid deleting our related records but even if this record is restored (for example via django-reversion) then the link won’t be restored. The other options provided by Django are also not appropriate for implementing and providing soft/safe deletions, so we look to 3rd party libraries.

Django-reversion

One option would be to allow the cascading delete but serialise all foreign keys using the follow feature of django-reversion. This should preserve all related records and links but at the cost of significantly greater storage and processing resource usage.

This approach (using django-reversion) has the advantage though of requiring no changes to the models themselves (no additional fields, no replacement of the default manager), minimal code to register the appropriate signals and simple integration with the Django admin site.

With hundreds of thousands of records though, each of which must be serialised to JSON and stored as a Version, this approach, while conceptually simple, is rather inefficient when dealing with records with deep relations. Worse, these relations will end up being specified twice, once in the Django model and then again when setting up django-reversion to follow the relationship and serialise the related objects, which may have related objects, down to several levels deep.

It is also worth noting that django-reversion only works via the admin unless you are prepared to incorporate it into your system using its low-level API–as noted before it doesn’t modify your models which is a good thing, but only if you were primarily interested in managing (un)deletes via the admin interface.

Django-safedelete

Django-safedelete adds a deleted field to any model by way of an abstract base model class. Deletion of a record via its delete() method will now soft-delete it. Any related records, which would normally have been deleted under Default Django behaviour will also be soft-deleted. At this point a Changeset record will be saved, referencing all affected objects. Restoring any of these objects will cause the whole structure of related objects to be restored.

This provides something closer to what we are after, but it does require modification of the code and behaviour of our models, beyond the simple signals used in django-reversion.

We need to have all models which should be soft-deleteable extend from the abstract model SoftDeleteObject which will cause them to have an extra Datetime field, deleted_at and will replace their default manager with one that does not include soft-deleted objects.

In integrating this with an existing project this required schema migrations to add the field, and for all custom managers to extend SoftDeleteManager rather than the default manager class.

One design element that did not fit well with my own use case was that in the admin interface staff could see all records, soft-deleted and otherwise, and could also (un)delete them just by (un)checking a deleted checkbox. This is in line with the Django ethos that users of the admin are trusted to take care with the power they have over the database.

In my case however I am using a separate instance of the admin site that is pared down and used by less technical users. What I would like is for soft-deleted records to not appear, and for undelete to be available only to me and/or certain users.

django-softdelete does not directly support this but with a few simple modifications it can. These changes could also be broken out into settings to make it configurable.

The fork with these changes, and a few more notes on installation and integration, is available here.

Conclusion

Looking at the Django Packages page for deletion-related libraries, django-reversion is used by 41 sites, Python 3 and Django 1.6 compatible with ongoing and regular updates. As far as projects generally go I would say it is extremely well maintained, and so from that perspective I would really rather be using it than django-softdelete that is used by zero sites (apparently!), has an unknown development status and no commits for a year now.

That said, it goes to show that sometimes a very well-implemented and maintained system can lose out to a pretty scrappy one if the assumptions of the scrappy one are closer to what is really wanted.

In implementing soft delete I would still recommend checking out django-reversion first as it is clearly the better supported library and offers many more features in terms of versioning (which I didn’t happen to need). For me, having to specify all the relationships to follow explicitly seemed brittle as there were then two places where relationships would need to be specified. django-reversion could perhaps be patched to follow all relations automatically but given the complexity and interrelatedness of the models (and so records) in this system it seemed like a lot of overhead for what only needed to be simple masking.

There is no one-size-fits-all solution unfortunately, as there will be tradeoffs with each. In particular, if any of the following are true you are not likely going to be able to use django-softdelete in the long run and may be better off with django-reversion:

  • You want soft-deleted records to not be considered when checking uniqueness constraints (you will get a duplicate key error, which will be confusing for the user).
  • You want not only soft-delete but also versioning (django-reversion provides this).
  • You want the Django User model (or other 3rd party model) to be soft-deleteable (django-reversion can patch a 3rd party model; this could be implemented for django-softdelete but you will need to do that yourself, adding extra work).

Likewise if any of the following are instead true then you may be better off with django-softdelete:

  • You want soft-deletion to work system-wide, not just in the admin, with no extra coding (django-reversion works well via the admin but outside of the admin site you will need to use the low-level API to implement handling of versions and (un)deletion).
  • You have hundreds of thousands of records which are densely interrelated; this would cause django-reversion to serialise the same records again and again.
  • The system is structurally in flux, with models and relationships being added (and perhaps removed) on a regular basis. In this case you will need to ensure that you keep your follow definitions up to date or else you may go to restore data and discover that not all related records were persisted.

There could be many more potential caveats, some of which may not become apparent until later, unless you have a solid test suite (I hope you do!). Given the myriad options I found it helpful to keep going back to my definition of the problem for the system in hand, to ensure that I understood exactly what flavour of soft-deletion I needed. This led me to choose a library that on the surface is much scrappier than its more professional cousin but in the end was quicker to integrate and more closely fit the desired behaviour.