Custom Data Migrations in Core Data with RubyMotion

Overview

Data vs. Schema Migrations

If you are only making relatively simple changes to your schema such as adding/removing a model attribute or relationship then it is likely that you can simply create a new version of your schema and Core Data’s automatic lightweight migration process will take care of it for you, generating the necessary SQL and migrating the data store in place in a fraction of a second.

If instead however you need to modify the data that is stored by Core Data and not just the structure then you need a “heavyweight” data migration.

Using Xcode you would create an entity mapping file using its visual tools and specify the necessary data transformations in the form of expressions (via NSExpression).

If the transformations you are applying are simple, such as adjusting a value for inflation, then you can use expressions to achieve this, meaning you would just need to put the expression string in the appropriate box in Xcode.

For example, if we have a Product entity with a price property then we can create an NSPropertyMapping with name set to "price" and a valueExpression of $source.price * 1.03.

During the migration Core Data will evaluate the expression on a per-record basis and apply the result to the price attribute on Product in the destination data store. As the property name is the same in both cases, this updates the value.

We can also use expressions to do more complex things such as migrate the data from one attribute to another (as part of an attribute rename operation), or even migrate the data for a relationship to another entity.

For cases which functions cannot handle, or for finer-grained control, you can specify custom NSMigrationPolicy classes where you can express the necessary transformations directly in code.

The Problem

When developing in RubyMotion, libraries such as ruby-xcdm and its partner library CDQ make handling lightweight/schema migrations straightforward, perhaps even more straightforward than the Xcode-based process that they replace.

When it comes to data/heavyweight migrations however, the lack of visual tools and worse, any real documentation on how to set up data migrations programmatically, leaves us in a difficult situation.

The Solution

The ideal solution would be a library, perhaps an extension to ruby-xcdm that could translate a nice Ruby DSL into entity mapping files in the format that Core Data expects.

Until such a tool exists, we can set up our data migrations by using the Core Data API directly and use a little Ruby to make it easier to manage. This approach also carries the advantage that knowledge of the lowest levels of the Core Data migration API will stand you in good stead should you encounter bugs in any helper libraries that you do use, or you need to go beyond the functionality that they offer.

For example, let’s say that we have a Person entity that has a name attribute and we want to:

  1. Split the name out into two new fields, first_name and last_name.
  2. If only one name has been specified, that name should be put into the first_name field.

This is a very minor model change but it will serve to demonstrate how to mix schema and data migrations such that they are handled seamlessly by the same migration process.

The Migration Process

In order to fully understand data migrations we first need to understand the steps that Core Data takes when we initiate what is termed a heavyweight migration. As we have covered previously, a lightweight migration is performed in place and can involve basic schema changes. A heavyweight migration is not performed in place but instead involves reading from the source data store and building up a copy of the store with the migration transformations applied. Once complete, the original store is replaced by the migrated store.

Here is an overview of the process, as executed by Core Data on our behalf:

  1. Create two Core Data stacks (contexts), the source stack and the destination stack.
  2. Load the current records into the source stack.
  3. Create the equivalent records in the destination stack (after applying the relevant transforms of the migration). Note that validation is not performed at this stage.
  4. Create any relationships between records that should exist in the destination stack, again without running validation.
  5. Finally, apply validation constraints. If validation passes then signify success, otherwise report any errors.

Our data migration code is going to be executed at stages 3 and 4, where we will have a chance to override exactly how the destination records and relationships are created in the destination store. As we will see later, during the process our migration policy class (of type NSEntityMigrationPolicy) will be handed each source record in turn and we will have total control over how it is created in the destination store.

Getting Started

Create a new app:

$ motion create SimpleHeavyweightMigration
    Create SimpleHeavyweightMigration
    Create SimpleHeavyweightMigration/.gitignore
    Create SimpleHeavyweightMigration/app/app_delegate.rb
    Create SimpleHeavyweightMigration/Gemfile
    Create SimpleHeavyweightMigration/Rakefile
    Create SimpleHeavyweightMigration/resources/Default-568h@2x.png
    Create SimpleHeavyweightMigration/spec/main_spec.rb

Add the Core Data framework to the app, in the Rakefile:

Motion::Project::App.setup do |app|
  # Use `rake config' to see complete project settings.
  app.name = 'SimpleHeavyweightMigration'
  app.frameworks += [ 'CoreData' ]
end

Install Core Data Query (CDQ):

$ echo "gem 'cdq'" >> Gemfile
$ bundle install
Fetching gem metadata from https://rubygems.org/............
Resolving dependencies...
...
Installing cdq (0.1.10)

Initialise CDQ:

$ cdq init

This creates the schemas directory and also handily adds in CDQ’s spec helpers for when we’re writing tests.

Set up the Person entity in the first version of our schema, in schemas/0001_initial.rb:

schema "0001 initial" do
  entity "Person" do
    string :name, optional: false
    string :address, optional: true
  end
end

Check that the schema builds successfully:

$ rake schema:build
Generating Data Model SimpleHeavyweightMigration
   Loading schemas/0001_initial.rb
   Writing resources/SimpleHeavyweightMigration.xcdatamodeld/0001
           initial.xcdatamodel/contents

For convenience, we create a CDQ model class to wrap our Person entity:

$ cdq create model Person

     Creating model: Person

  Δ  Creating directory: app/models
  Δ  Creating file: app/models/Person.rb
  Δ  Creating directory: spec/models
  Δ  Creating file: spec/models/Person.rb

Finally we ask CDQ to set up our Core Data stack on startup, in app/app_delegate.rb:

class AppDelegate
  include CDQ

  def application(application, didFinishLaunchingWithOptions:launchOptions)
    cdq.init
    true
  end
end

We are now ready to build and run the app, which should set up a Core Data stack for us, initialised using the first schema version that we defined earlier.

Build and run the app, then create some Person records that we will later migrate:

$ rake
...
(main)> Person.count
=> 0
(main)> Person.create(name: "John Doe", address: "12-14-207 Osaka")
=> <Person: 0xd8b7b30> (entity: Person; id: ... ; data: {
    address = "12-14-207 Osaka";
    name = "John Doe";
})
(main)> Person.create(name: "Jane", address: "24 Bond Street, London")
=> <Person: 0x91a3e50> (entity: Person; id: ... ; data: {
    address = "24 Bond Street, London";
    name = "Jane";
})
(main)> cdq.save
=> true
(main)> Person.count
=> 2

Next Steps

We now have a version 0001 database initialised with two Person records. While we experiment, what we will do is disable CDQ’s initialisation of the datastore on app startup and instead set up our migration machinery. Once complete, the idea will be that our migration machinery will perform all necessary migrations so that when CDQ initialises, the data store will already be up to date and so it will only need to set up the context.

On app startup we will step through each schema version and attempt a migration as necessary. For schema migrations the datastore will be migrated in place, for data migrations a copy of the datastore will be migrated and will replace the principal datastore if the migration completes successfully.

Disable CDQ’s initialisation for now (in app/app_delegate.rb):

class AppDelegate
  include CDQ

  def application(application, didFinishLaunchingWithOptions:launchOptions)
    # cdq.init
    true
  end
end

We are now going to set up a loop which will step through each new schema in turn and migrate the datastore to that schema. This differs from the standard Core Data migration process which attempts to jump from the current datastore version straight to the latest schema version available. To make our lives easier we will force migrations to occur one after the other, in order, so that we don’t need to test every schema version against every later schema version in order to be sure that our migrations won’t break anything.

In order to demonstrate not just how we will perform data migrations but also how we will seamlessly handle schema migrations using this new setup we will migrate the Person record such that the name field will be split into two fields, first_name and last_name.

To achieve this we will create three migrations, two schema and one data:

  • [Schema migration] Create two new optional attributes on the Person entity, first_name and last_name.
  • [Data migration] If the name attribute contains a full name, split the name into two parts and store in the two new fields. If the attribute contains just one name then place it in the first_name field.
  • [Schema migration] Drop the name attribute, which is now no longer required, and make first_name be a required field.

Setting up the Migration Process

Iterative Migration in Detail

Core Data will by default attempt to migrate directly from the datastore’s current version to the latest-available version. This may work without issue, particularly if the changes are to the schema only, but with each additional version being skipped the chance of errors occurring increases significantly.

If we were to use the standard tools for versioning as provided by Xcode then what we could do is to set up iterative migration, where we step through each version in order and migration from one version to the next. Anyone with a background in web development with frameworks such as Rails and Django will recognise this pattern.

For each model version we would first check the bundle for a mapping model file (created in Xcode) which is designated as being for the current source and destination model versions in question. If this is found, it will be loaded and used in the migration process. The mapping model will be expected to contain all schema changes, and to specify any cases where a custom migration policy is to be loaded and executed.

If no such mapping file is found, then we would ask Core Data to infer one. For pure schema changes which involve adding/removing attributes or relationships, this will in most cases be possible. If a mapping model is inferred successfully then we will use that to perform the migration and will be ready to move on to the next version, repeating the same checks.

If no mapping model has been explicitly provided in the bundle, and the mapping model cannot be inferred, then we would need to give up and report this failure as an error. By thoroughly testing migrations and ensuring that all relevant files are present in the build that is submitted to Apple, this should in general not happen.

Why this process is insufficient

If we happened to have a library that we could use to easily create mapping model files then we would be able to use this exact process. As a long-term goal for RubyMotion and Core Data this could be a worthy extension to a library such as ruby-xcdm, which already handles the easy creation of versioned managed object model schemas.

As this tool does not yet exist (to my knowledge), we need to take a different approach.

If we don’t find a mapping model for the particular migration we are looking at, it doesn’t mean that it is a pure schema migration and not a custom migration. We will instead need to signal explicitly when a migration is a data migration.

To do so, we will explicitly store a list of migration names which are data migrations, along with a reference to a custom migration handler method which will provide the necessary mapping model. Presence in this list will indicate that the migration in question is a data only migration.

It is generally good practice to separate schema migrations from data migrations, and this is the approach that we will take here. If the migration is not in the data migration list then we will assume that the migration is a schema-only migration and that the mapping model can be inferred automatically.

Setting up the Datastore for Testing

When testing migrations manually follow these steps to run the migrations from a known starting point:

  1. Delete the app from the simulator.
  2. Place only the initial (0001) schema in the schemas directory.
  3. Run rake clean to start with a clean environment.
  4. Run rake schema:build. Run the app, calling only cdq.init in the app delegate, to set up the database.
  5. Create one or more Person records, call cdq.save to commit the changes.
  6. Add in the schema versions that you would like to migrate to, run rake schema:build.
  7. Comment out cdq.init and add in a call to our migration routine; run the app, thus kicking off the migration process.
  8. Re-enable CDQ, inspect the database and check that the migration was successful.

The First Schema Migration

Here we will create a new schema version with the first_name and last_name fields added.

Copy the original schema version to a new file to create the base for our second schema version:

$ cp schemas/0001_initial.rb schemas/0002_add_name_fields.rb

And add the two new fields so that the schema is as follows:

schema "0002 add split name fields" do
  entity "Person" do
    string :name, optional: false

    # The new fields will be optional so that they
    # can be initially blank after the schema migration.
    # We will then populate them with data and remove
    # the original name attribute.
    string :first_name, optional: true
    string :last_name, optional: true

    string :address, optional: true
  end
end

Note that:

  • We have updated the name of the schema on the first line, which we must do each time, following the same naming/numbering conventions to keep all the migrations in order.
  • The first_name and last_name fields have been added in as optional fields since at this point, when the schema migration runs, they will have no data. Once the data has been migrated we will remove the name field and make first_name be a required attribute.

If we run the app now the new schema version will be built but nothing further will happen—CDQ has been disabled and we haven’t yet replaced it with anything.

In the app delegate, before our commented-out call to cdq.init, let’s put in a call to our own migration routine:

def application(application, didFinishLaunchingWithOptions:launchOptions)
  self.perform_migrations
  # cdq.init
  true
end

And then beneath that add in the implementation of perform_migrations, which will handle our first schema migration for us:

def perform_migrations
  # Load the latest model version, as found in the main app bundle
  model = NSManagedObjectModel.mergedModelFromBundles(nil)
  latest_model_version_string = model.versionIdentifiers.anyObject
  puts "[INFO] Latest model version is #{latest_model_version_string}"

  # Core Data Query by default sets the filename of the data store based on
  # the name of the app; we follow this convention so that our migration
  # process can work transparently alongside CDQ's initialisation process.
  app_name = NSBundle.mainBundle\
                     .objectForInfoDictionaryKey("CFBundleExecutable")
  store_path = File.join(NSHomeDirectory(), 'Documents', "#{app_name}.sqlite")
  store_url = NSURL.fileURLWithPath(store_path)

  puts "[INFO] Database file path: \"#{store_path}\""

  error_ptr = Pointer.new(:object)

  # Fetch the metadata for the current data store.
  metadata = NSPersistentStoreCoordinator\
                        .metadataForPersistentStoreOfType(NSSQLiteStoreType,
                                                          URL:store_url,
                                                          error:error_ptr)

  # Check whether the metadata matches the current schema version or not.
  # Metadata will be nil if this is the first run of the app.
  if not metadata.nil?
    curr_schema_version_string = metadata["NSStoreModelVersionIdentifiers"]\
                                                                        .first

    if curr_schema_version_string\
                                .isEqualToString(latest_model_version_string)
      puts "[INFO] Store schema version matches latest schema version"
    end

    # Fetch all managed object models
    # NOTE: Result will be properly ordered if a suitable naming convention
    #       has been used, e.g. 0001, 0002 etc. for the schema version
    #       identifier.
    mom_paths = NSBundle.mainBundle\
                     .pathsForResourcesOfType(".mom",
                                              inDirectory:"#{app_name}.momd")

    moms = []

    mom_paths.each do |path|
      moms << NSManagedObjectModel.alloc\
                          .initWithContentsOfURL(NSURL.fileURLWithPath(path))
    end

    puts "[INFO] #{moms.count} managed object model(s) found"

    mom_index_for_current_datastore = -1

    # Find the managed object model for the current datastore
    moms.each_with_index do |mom, index|
      mom_version_identifier = mom.versionIdentifiers.allObjects.first

      if mom_version_identifier == curr_schema_version_string
        mom_index_for_current_datastore = index
      end
    end

    source_model = nil

    if mom_index_for_current_datastore == -1
      puts "[ERROR] Failed to find managed object model for current version" +
           " of datastore"
      return
    else
      puts "[INFO] Current datastore version is managed object model" +
           " #{mom_index_for_current_datastore+1} of #{moms.count}"
      source_model = moms[mom_index_for_current_datastore]
    end

    # All data migrations will appear here in the form
    # [ "000X schema description" => :migration_handler_method ]
    # For now we have no data migrations
    data_migrations = { }

    # Working forwards one version at a time, migrate from that version to
    # the next-latest version.
    moms.slice(mom_index_for_current_datastore + 1, moms.count - 1)\
                                                      .each do |model_version|
      destination_model = model_version
      destination_model_version_name = destination_model.versionIdentifiers\
                                                        .allObjects.first

      puts "[INFO] Examining version '#{destination_model_version_name}'"

      # The goal here is to get a mapping model for our migration, which will
      # either be programmatically configured via a data migration or will be
      # inferred automatically for a schema migration.
      mapping_model = nil

      error_ptr = Pointer.new(:object)

      if data_migrations.keys.include? destination_model_version_name
        puts "[INFO] Migrating forwards to model version" +
             " '#{destination_model_version_name}' (Custom)"

        # Take the name of the handler method for the current migration and
        # 'send' it to this class, the app delegate. The send method will
        # cause the handler to be invoked, which we expect to return a mapping
        # model which should contain all the necessary information that Core
        # Data needs in order to execute it.
        handler_method_name = data_migrations[destination_model_version_name]
        mapping_model = self.send(handler_method_name)
      else
        puts "[INFO] Migrating forwards to model version " +
             "'#{destination_model_version_name}' (Automatic/Inferred)"


        # The migration policy does not exist, so we will perform an
        # automatic migration, where Core Data will infer automatically how
        # to bring the schema up to date.
        mapping_model = NSMappingModel\
          .inferredMappingModelForSourceModel(source_model,
                                           destinationModel:destination_model,
                                           error:error_ptr)
      end

      unless mapping_model
        raise "[ERROR] Failed to infer mapping: " +
              "#{error_ptr[0] and error_ptr[0].description}"
      end

      manager = NSMigrationManager.alloc\
                      .initWithSourceModel(source_model,
                                           destinationModel:destination_model)

      destination_store_url = store_url.URLByAppendingPathExtension('tmp')

      file_manager = NSFileManager.defaultManager

      error_ptr = Pointer.new(:object)

      # Remove the temporary destination store if it already exists from a
      # previous migration. If the file exists, the copying process which
      # occurs later will fail (it won't overwrite files).
      if file_manager.fileExistsAtPath(destination_store_url.path)
        error_ptr = Pointer.new(:object)
        file_manager.removeItemAtPath(destination_store_url.path,
                                      error:error_ptr)
      end

      error_ptr = Pointer.new(:object)

      # Migrate the data store, passing in a mapping model which has either
      # been automatically inferred (and is therefore equivalent to asking
      # Core Data to handle the migration automatically), or has been created
      # by a migration handler method which we have written.
      result = manager\
                  .migrateStoreFromURL(store_url,
                                       type:NSSQLiteStoreType,
                                       options:nil,
                                       withMappingModel:mapping_model,
                                       toDestinationURL:destination_store_url,
                                       destinationType:NSSQLiteStoreType,
                                       destinationOptions:nil,
                                       error:error_ptr)

      if result
        # Migration complete, we can now move the new store in place of the
        # original

        coordinator = NSPersistentStoreCoordinator.alloc\
                                        .initWithManagedObjectModel(moms.last)

        store = coordinator\
                        .addPersistentStoreWithType(NSSQLiteStoreType,
                                                    configuration:nil,
                                                    URL:destination_store_url,
                                                    options:nil,
                                                    error:error_ptr)

        file_manager.removeItemAtPath(store_url.path, error: error_ptr)

        coordinator.migratePersistentStore(store,
                                           toURL:store_url,
                                           options:nil,
                                           withType:NSSQLiteStoreType,
                                           error:error_ptr)
      else
        raise "[ERROR] Failed to migrate store: #{error_ptr[0].description}"
      end

      # On the next run through this loop (if there is one) we will migrate
      # _from_ the version that we have just migrated _to_, and continue until
      # there are no more versions left; we are then at the latest version.
      source_model = destination_model
    end
  end
end

If we run the app now then we should see the following output:

[INFO] Latest model version is 0002 add split name fields
[INFO] Database file path: ".../SimpleHeavyweightMigration.sqlite"
[INFO] 2 managed object model(s) found
[INFO] Current datastore version is managed object model 1 of 2
[INFO] Examining model version '0002 add split name fields'
[INFO] Migrating forwards to model version '0002 add split name fields'
       (Automatic/Inferred)

And with that, the data store has been brought up to date and the Person entity should now have two additional columns. We can confirm this outside of the app by connecting to (a copy of) the SQLite database directly:

$ sqlite3 <database file path as reported by the app output>.tmp
SQLite version 3.7.14.1 2012-10-04 19:37:12
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .tables
ZPERSON       Z_METADATA    Z_PRIMARYKEY
sqlite> .schema ZPERSON
CREATE TABLE ZPERSON ( Z_PK INTEGER PRIMARY KEY,
                       Z_ENT INTEGER,
                       Z_OPT INTEGER,
                       ZADDRESS VARCHAR,
                       ZFIRST_NAME VARCHAR,
                       ZLAST_NAME VARCHAR,
                       ZNAME VARCHAR );

Issues with sqlite3

Opening a Core Data database with sqlite3 may cause Core Data to reject the database as being corrupt on subsequent runs of the app, so be sure to open the .tmp version of the database that we now no longer need, or take a copy of the database file and inspect that (though take care to copy the WAL log file if it is present).

Behind the Scenes

If something goes wrong or you would like (much) more detail then run the app with the following arguments:

rake args="-com.apple.CoreData.SQLDebug 3 \ -com.apple.CoreData.MigrationDebug 3"

Replacing the original store

We start by removing the original store file. If the migration has completed successfully then it should be safe to do so, though we might want to instead rename it instead (e.g. add .bak). If we don’t remove the store file before the next step then Core Data will attempt to append the data to the store, and will refuse to do so as the store is not at the correct schema version (which is exactly why we have been migrating it).

With the original store (re)moved, we can use the migratePersistentStore method of the persistent store coordinator:

store = coordinator.addPersistentStoreWithType(NSSQLiteStoreType,
                                               configuration:nil,
                                               URL:destination_store_url,
                                               options:nil,
                                               error:error_ptr)

file_manager.removeItemAtPath(store_url.path, error:error_ptr)

coordinator.migratePersistentStore(store,
                                   toURL:store_url,
                                   options:nil,
                                   withType:NSSQLiteStoreType,
                                   error:error_ptr)

SQLite and WAL Journaling

Since iOS 7 and OSX 10.9 the default journal mode for SQLite-based Core Data stores is WAL (Write-Ahead Logging). This causes a WAL log file to be created alongside the database file, which is managed by Core Data for us.

Before iOS 7, to replace the store with the newly-migrated store you would simply replace the store file with the new SQLite database file that was created in the migration process.

If we do this with iOS 7 and above however then we will have a freshly-created database alongside a WAL log for the previous store, which is a near guaranteed recipe for inconsistency in the form of the database being unreadable.

There is more than one way to solve this problem, but the one I would recommend would be to stick with the Core Data API and let it do the work for you.

The Data Migration

Now that we have our two new fields our task is to take the names stored in the name attribute of each Person record and split them out into their first and last names respectively.

For our test data above we have two records, “John Doe” and “Jane”. For the former we want to split this into the two names and populate each field. As Jane has no last name specified we just want her record to have a first_name of “Jane”.

We start by creating a new schema version which is simply a copy of the previous version. It will have a new name (thus distinguishing it from other migrations) but as the schema will be identical it will essentially be a no-op in terms of the schema, which is what we want. We will then specify our data transformation separately.

Let’s start by copying the current schema to a new file to create a new (identical) version:

$ cp schemas/0002_add_name_fields.rb schemas/0003_migrate_name_data.rb

And give it a new name which reflects what it will do:

schema "0003 migrate name data" do
  entity "Person" do
    string :name, optional: false

    string :first_name, optional: true
    string :last_name, optional: true

    string :address, optional: true
  end
end

Now, in our migration routine, add it in to the data_migrations hash1:

When the migration routine encounters a migration with the name "0003 migrate name data" it will get the mapping model for the migration by calling self.migration_0003_migrate_name_data.

Let’s now define this custom migration method, which should be a private instance method of AppDelegate:

def migration_0003_migrate_name_data
  # We will create our own mapping model from scratch
  mapping_model = NSMappingModel.alloc.init

  # Create an entity mapping which will map from the Person
  # entity in the source datastore to the Person entity
  # in the destination store, i.e. this is an operation
  # to transform the Person entity.
  entity_mapping = NSEntityMapping.alloc.init
  entity_mapping.setName("PersonToPerson")
  entity_mapping.setSourceEntityName("Person")
  entity_mapping.setDestinationEntityName("Person")

  # The source expression should evaluate to a fetch request
  # which returns all of the source records that should be
  # migrated. In our case we wish to migrate all of them,
  # but we could modify this to return only certain Person
  # records based on whatever criteria we like.
  expr_str = "FETCH(FUNCTION($manager, " +
             "\"fetchRequestForSourceEntityNamed:predicateString:\" , " +
             "\"Person\", \"TRUEPREDICATE\"), $manager.sourceContext, NO)"
  source_expression = NSExpression.expressionWithFormat(expr_str)
  entity_mapping.setSourceExpression(source_expression)

  # We do not perform the actual data transformation here in
  # this migration method, instead we delegate this to a
  # dedicated class which we will provide. Core Data will
  # instantiate the class and delegate the task of
  # creating all Person entities in the destination store
  # to it.
  entity_mapping.setEntityMigrationPolicyClassName("Migration_0003")

  # The custom entity mapping type is for a mapping which
  # delegates the mapping procedure to a custom class, as
  # specified above.
  entity_mapping.setMappingType(NSCustomEntityMappingType)

  mapping_model.setEntityMappings([ entity_mapping ])

  mapping_model
end

The migration handler method sets up an entity mapping that defines which records will be migrated (all Person records, as determined by the source expression), and points to the custom migration policy class which will handle the data mapping, which is a class named Migration_0003.

This class does not yet exist, so create a new directory app/migrations and create a new file app/migrations/0003_migrate_name_data.rb:

class Migration_0003 < NSEntityMigrationPolicy
  def beginEntityMapping(mapping, manager:manager, error:error)
    puts "[INFO] Initialise custom migration"

    true
  end

  def createDestinationInstancesForSourceInstance(source,
                                                  entityMapping:mapping,
                                                  manager:manager,
                                                  error:error)
    puts "[INFO] Migrating entity: #{source}"

    source_keys = source.entity.attributesByName.allKeys.mutableCopy
    source_values = source.dictionaryWithValuesForKeys(source_keys)
    destination = NSEntityDescription\
      .insertNewObjectForEntityForName(mapping.destinationEntityName,
                                       inManagedObjectContext:manager\
                                                          .destinationContext)

    destination_keys = destination.entity.attributesByName.allKeys

    # Populate destination instance with data from source instance
    destination_keys.each do |key|
      value = source_values.valueForKey(key)
      # Avoid NULL values
      if (value and !value.isEqual(NSNull.null))
        destination.setValue(value, forKey:key)
      end
    end

    # Perform the migration, which is to populate the first_name and
    # last_name fields from the contents of the name field
    name = source.valueForKey("name")
    name_parts = name.split(" ")
    if name_parts.count > 1
      destination.setValue(name_parts[0], forKey:"first_name")
      destination.setValue(name_parts.slice(1, name_parts.count).join(" "),
                           forKey:"last_name")
    else
      destination.setValue(name, forKey:"first_name")
    end

    true
  end
end

The class Migration_0003 is subclassed from NSEntityMigrationPolicy and implements two methods, one which will be called once when the migration process is initialised and the other which will be called for each record returned by the fetch expression that we set up earlier (the source expression for the entity mapping).

In this particular case we don’t need to initialise anything but if we did, the beginEntityMapping method would be the place to do it, rather than repeat the initialisation for every record.

The createDestinationInstancesForSourceInstance method will be called for every Person record (in this case). The task of the method is to create a Person record in the destination store for each Person record in the source store, with its attributes modified appropriately.

In this case we begin by creating a new Person instance and then copy all attribute values over from the source. We then split the name field and either populate only the first_name field, or both fields if more than one name is present2.

With our entity mapping being provided by our custom migration handler, and our custom migration policy class in place, we are now ready to run our app and perform our first data migration:

$ rake
[INFO] Latest model version is 0003 migrate name data
[INFO] Database file path: ".../SimpleHeavyweightMigration.sqlite"
[INFO] 3 managed object model(s) found
[INFO] Current datastore version is managed object model 2 of 3
[INFO] Examining model version '0003 migrate name data'
[INFO] Migrating forwards to model version '0003 migrate name data'
       (Custom)
[INFO] Initialise custom migration
[INFO] Migrating entity: #<NSManagedObject_Person_:0x8f99f00>
[INFO] Migrating entity: #<NSManagedObject_Person_:0x8f9a180>

If we now re-enable CDQ by uncommenting cdq.init and re-run our app we can confirm that the data migration completed successfully. The migration process will recognise that the data store is already at the latest version and take no action; CDQ will then initialise our Core Data stack as before:

$ rake
[INFO] Latest model version is 0003 migrate name data
[INFO] Database file path: ".../Documents/SimpleHeavyweightMigration.sqlite"
[INFO] Store schema version matches latest schema version
[INFO] 3 managed object model(s) found
[INFO] Current datastore version is managed object model 3 of 3

(main)> Person.all.count
=> 2
(main)> Person.all.each do |p|\
  puts "'#{p.name}' -> f: '#{p.first_name}' l: '#{p.last_name}'"\
end
'Jane' -> first: 'Jane' last: ''
'John Doe' -> first: 'John' last: 'Doe'

The Final Schema Migration

The last step to take now is to remove the name field. This step is actually optional and there are arguments for not removing the field: if you later discover an issue with the data migration then the source data (in this case the name field) is still present for those records which would allow you to re-attempt the migration at a later date.

We do however at the very least need to make the name field optional as otherwise new records will fail validation as they will no longer have a name value set.

For either case, the migration is set up by again copying the latest version, making changes to the schema and letting the automatic mapping model inference process figure out what SQL to execute in order to update our data store:

$ cp schemas/0003_migrate_name_data.rb schemas/0004_remove_name_field.rb

And then modify the schema to remove the name field and make first_name a required field, with last_name remaining optional:

schema "0004 remove name field" do
  entity "Person" do
    string :first_name, optional: false
    string :last_name, optional: true

    string :address, optional: true
  end
end

Running the app once more, we should see it automatically migrate forward to this latest version. We then perform a quick test of the new validation rules:

[INFO] Latest model version is 0004 remove name field
[INFO] Database file path: ".../Documents/SimpleHeavyweightMigration.sqlite"
[INFO] 4 managed object model(s) found
[INFO] Current datastore version is managed object model 3 of 4
[INFO] Examining model version '0004 remove name field'
[INFO] Migrating forwards to model version '0004 remove name field'
       (Automatic/Inferred)

(main)> Person.create(first_name: "Dave")
=> <Person: 0xa1aefa0> (entity: Person; id: ... ; data: {
    address = nil;
    "first_name" = Dave;
    "last_name" = nil;
})
(main)> cdq.save
=> true
(main)> Person.create(last_name: "Thomas")
=> <Person: 0x976dfc0> (entity: Person; id: ... ; data: {
    address = nil;
    "first_name" = nil;
    "last_name" = Thomas;
})
(main)> cdq.save
Error while fetching: The operation couldn't be completed. (Cocoa error 1570.)
# ... Further error output

The data store has been migrated and then a fresh Core Data stack initialised for us by CDQ. We are able to create a new record with just a first name, but validation fails if we try to create a record with only a last name specified, which is what we would expect.

Wrapping Up

Using the techniques in this post we now have a mechanism built into our app that allows us to take advantage of automatic migrations for most schema changes while also making it easy to mix in data migrations. The process has been further simplified by making the migration process iterative, removing the need to test the migration process from each model version to each future version. Finally, we employ ruby-xcdm as in previous posts so that we can use a convenient schema notation to express our schema versions.

Regarding data transformations, another approach which does not require a separate migration policy class is to use property mappings where the transformation is encoded as NSExpression-compatible expression strings. This approach is perhaps advantageous when configuring the migration using a GUI, as would normally be done in Xcode, but as we are not constrained by a GUI we have opted to encode the transformations directly in code.

One aspect of data migrations that we have not explored yet is how to alter the relationships between entities as part of a data migration. This will be covered in a later post.

Where to next?

Book-Cover

This post is part of a Leanpub book that I’m writing on using Core Data in RubyMotion projects. If you would like to learn more about using Core Data effectively in your RubyMotion apps please take a look.

Also coming up will be more blog posts where we will delve into some of the thornier issues that you might encounter when using Core Data. If such posts would be of interest to you then you might want to follow me on Twitter.


  1. A hash is naturally unordered but we are not using it to preserve the order of the migrations, it is used here only as a lookup table for the migration handler method.

    data_migrations = { “0003 migrate name data” => :migration_0003_migrate_name_data } 

  2. This is naturally an over-simplistic approach to the problem of name splitting but for the purpose of demonstrating the concept of the data migration is sufficient.