Migrating from TFS to VSTS, Part 4 – Work Items

In the last post we migrated the areas, iterations and queries. With that out of the way we can now migrate the work items. This is by far the most difficult and time consuming part of the process. For our migration we had some pretty stringent requirements which made the migration harder. Things to consider include the items to migrate, the links to include, the changes in the IDs, the user identities, attachments and images and whether history needs to be retained.

Part 1 – TfsMigration Tool
Part 2 – Migrating from TFVC to Git
Part 3 – Migrating Areas and Iterations
Part 4 – Migrating Work Items
Part 5 – Migrating Build Components

As mentioned last time, the same processor (WorkItemTrackingProcessor) is used for migrating both areas/iterations and work items. The reason for this is really just to share data. In order to migrate a work item the corresponding area/iteration has to already be migrated. Since the migration tool allows for migrating arbitrary areas/iterations it already has the list of areas/iterations migrated. This information is needed for migrating work items later. The processor will migrate the areas/iterations first and then the work items.

All the settings to control work item migration are under the WorkItemTracking element in the settings file. The individual settings will be discussed as they are needed.

Migration Process

The migration process for work items follow the standard algorithm that the other processors follow. This is handled by the MigrateWorkItemsAsync method.

Get the work items to migrate from TFS and add to a queue
While the queue is not empty
Get the next item and remove from the queue
If the item has already been migrated then skip it
Migrate the work item

Identifying What to Migrate

Before anything can get migrated you have to identify what items should be migrated. There are several different ways to approach this and each migration might have different needs. For a newer TFS collection that does not have many items it might be useful to move everything. But for larger collections where there are thousands of items migrating them all would be wasteful. Some things to consider.

  • Is the work item still “active”?
  • When was the work item last updated?
  • Does the work item have any historical significance (i.e. a feature supported through grant money)?
  • Does the work item have related items that are still important (i.e. features related to a larger, active epic)?
  • What related links should be migrated?

The migration process starts by running the queries defined under the queries element in the settings. This allows teams to migrate specific work items without any conditions. It is assumed that one of the queries will include all the active items in the existing system but can include (or exclude) anything. All the work items returned by these queries are guaranteed to be migrated irrelevant of any other rules in place.

"queries": [
      "name": "Shared Queries/Items to Migrate"

Note: For simplicity the queries must be shared queries.

The GetWorkItemsAsync method enumerates the queries defined in the settings and runs each one using the QueryWorkItemsAsync method. For each work item ID returned it is added to the migration queue for later.

QueryWorkItemsAsync retrieves the query by name using the GetQueryAsync extension method off the WorkItemTrackingHttpClient. If the query does not exist then it raises an error. Otherwise the query is run using the QueryAsync extension method off the client. The extension method runs the query by its ID. The query returns a WorkItemqueryResult containing a reference to each work item. Since only a reference is returned the work item will have to be retrieved explicitly later.

One of the challenges with running queries is that it may return either a flat list or a hierarchy and this returns different data to the client. The method attempts to flatten the list. The WorkItems property contains the work items if this is a flat list so they are added to the list of returned items. For a hierarchy the WorkItemRelations property is used. For relations each target relation is added to the list of work items.

Migrating a Work Item

Now that the root work items are identified they can be migrated. For a single work item the WorkItemTrackingProcessor.MigrateWorkItemAsync method does the following.

Set up a migration record to identify the work item (both the source ID and, eventually, its target ID)
Get the work item details from TFS
Migrate the work item history
Migrate the work item relationships
Optionally add a comment about the migration to the work item

An important aspect of migrating work items is to note that the work item ID cannot be set so most likely the ID will be different from the source to the target systems. The processor tracks both values so it can map source references to target references as needed. Additionally the mapping is, optioanlly, entered as a breadcrumb value to allow a user to get back to the original ID later, if needed.

The GetWorkItemAsync extension method retrieves the work item from TFS. This extension method calls the GetWorkItemsAsync method on the client to retrieve the work item. It requests information about any child links (but not the children) as well. The work item can now be migrated.

Migrating Fields

Before discussing how the item history is generated it is important to understand work item fields. A work item is really nothing more than a collection of key-value pairs known as fields. Many of the fields are visible when editing a work item but many more are under the hood. Some fields are required for TFS to work properly (most start with System. or Microsoft.). Other fields are either process customizations by the team or added by extensions. Identifying which fields to migrate (or ignore) is important. Even after removing an extension the fields it defines will still remain so the existence of a field does not indicate it is being used anymore. Each field has a display name (seen in the UI), reference name (unique to the system), data type and value (if any).

Field Handlers

Many fields can be copied from TFS to VSTS unchanged. Some fields will require their value be changed (i.e. users) while others may have to be calculated. The migration tool defines the IFieldHandler to determine how a field is migrated. A field handler is responsible for looking at a field and determining if it should be migrated, to what field and what the value should be. A single field can have multiple field handlers.

The tool ships with a few handlers defined but more can be created if needed.

AreaFieldHandler and IterationFieldHandler are provided to migrate area/iteration fields. This will be discussed later.
IgnoreFieldHandler can be used to ignore a field. This is useful when dealing with extension-defined fields that are no longer needed.
RenameFieldHandler is used to rename a field. This is especially important for process customizations. In TFS custom fields follow a different naming convention than is allowed in VSTS. In VSTS the field name is prefixed with the process it is associated with. So renaming a field is useful in this case.
UserFieldHandler is used to handle identity-based fields. This will be discussed more later.
ValueFieldHandler is used to change the value of a field based upon an expression. Under the hood it uses System.Linq.Dynamic to support evaluation of LINQ expressions. The expression is evaluated based upon the current field’s value and whatever the result is will be the new value of the field. This is useful for doing simple things like swapping boolean values.

Field Settings

In the settings file the includeAllFields element determines if all the fields are migrated (the default) or only those listed in the settings file. There are generally way more fields that should migrate than should not so the default is reasonable but a team may decide to migrate only certain fields instead.

The fields element identifies the fields to treat specially (either include, alter or exclude).

"fields": [
        "name": "Reference name",
        "targetName": "Optional. If specified then the ```RenameFieldHandler``` is added to rename the field.",
        "handler": "Optional. Specifies a custom handler to use for handling the field migration - needed for area and iteration handling",
        "ignore": "Optional. If set to true then the ```IgnoreFieldHandler``` is added to the field.",
        "isUser": "Optional. If set to true then the ```UserFieldHandler``` is added to the field.",        
        "value": "Optional. If specified then it is an expression that will be passed to the ```ValueFieldHandler``` to determine the field's value."

Rebuilding History

There are several different approaches that could be used to migrate a work item. One option is to get the latest field values of the work item and create a new work item with that information. While fast this would lose the historical changes to the work item including the original creation date and user, the changes that had happened to the item as it has moved around and any changes that are no longer applicable. Therefore the migration tool attempts to recreate the history just like it occurred in TFS. This is a slow process and may cause some work items to not migrate properly.

Getting Source History

The MigrateWorkItemHistoryAsync method is responsible for rebuilding the history. It follows this algorithm.

Get the work item history
For each update
Create a patch document reprsenting the changes to the work item
If the work item hasn’t been created yet then create it otherwise update the existing one

The GetWorkItemHistoryAsync extension method is used to get the history of the work item. In VSTS this is called an update (not to be confused with a revision). The GetUpdatesAsync client method returns a list of WorkItemUpdate representing each set of changes to the work item and mostly matches what shows up in the history in the UI. Because this can get really long the method has to page the updates from TFS.

Creating the Target History

The CreatePatchDocumentAsync method does the heavy lifting of converting the update to a change in the work item. In VSTS an HTTP PATCH document is used to update only those items that need to be changed. This is an efficient way to update a large object but if you include/exclude things that you shouldn’t then the patch will fail.

Ensure the ChangedBy and ChangedDate fields are set
For each updated field
Try and get the field handlers to use for the field
If handlers exist or the includeAllFields setting is true
Enumerate each handler and let it update the field
If at any point a handler marks the field as ignore then skip the field
Get the target fields that are available in VSTS (cached)
If the current field is not in VSTS then skip it
Add or update the field in the patch document based upon the existence of an old value

VSTS requires that the ChangedBy and ChangedDate fields always be set so the method starts out by setting them to RevisedBy and RevisedDate fields in the update. if these fields are changed later by the update itself then that is fine.

The GetTargetFieldsAsync method retrieves the available fields from VSTS by calling the GetFieldsAsync client method. If you attempt to add a field to a work item that VSTS does not know about it will fail so this method tries to prevent that. The fields defined in VSTS are static to the process being used and hence won’t change for a single VSTS team. Once the target fields are retrieved they are cached so it does not have to occur again.

To add or update the field in the patch document the AddField and UpdateField extension methods are used. The only difference between these two methods is the Operation that is specified. However the patch document does require that fields be prefixed so the extension method builds the field path from the field name as well.

Creating and Updating the Work Item

Once the patch document is created the work item can be added or updated. The CreateWorkItemUnrestrictedAsync extension method will create the new work item. Before calling the CreateWorkItemAsync client method the document is enumerated and any fields with a null value are removed. While VSTS will return fields with no value when retrieving history
it does not allow empty field values when creating. So they are removed from the data.

Some of the fields being set cannot normally be modified either by process rules that are defined or because they are system-defined. But in this case we are modifying them anyway. To allow for this VSTS has an option to bypass the rules that we set. While not all rules can be bypassed (i.e. setting the ID) many can.

There are a couple of interesting side effects of this. The first side effect is that when you look at the final work item it will have the same history (dates and users) as the original source. So even though you weren’t using VSTS last year it can still show create/update dates that were. Hence the history of the item is temporally correct as well. For the user, the migration is running under a special account but the history will say the changes were made by the original user. In the current UI VSTS does show a special message with both users’ names). Beyond that the history should be identical between the source and target systems.

For updates the UpdateWorkItemUnrestrictedAsync method is called. This method behaves similar to the create but it has to modify the fields differently. Any field that has an empty value is considered a remove so the method enumerates the fields and changes any impacted field to remove.

Fixing Up Areas/Iterations

As mentioned earlier the AreaFieldHandler and IterationFieldHandler handlers handle areas and iterations. Unlike the other field handlers these handlers potentially interrupt the migration process to migrate a missing area/iteration. The handler looks to see if the sourc area has already been migrated (handling the possibility of renaming). If the area/iteration has not been migrated then it is migrated immediately and added to the migration list so it won’t be migrated again. The handler then returns the (potentially renamed) target area/iteration.

Note: For testing the custom handler syntax these handlers were not given specialized element names. It would not be hard to add isArea/isIteration and remove the type names if desired.

Fixing Up Users

All fields that are identity references need to use the UserFieldHandler. While VSTS will accept any identity reference value it will only properly match to the VSTS user if it is formatted correctly. This can be confusing because in the source system you may have John Doe but behind the scenes they would be represented as their user name (i.e. company\jdoe). If you migrate a field with this value directly across then you would still see John Doe but it isn’t the John Doe in VSTS. Searching for the user, attempting to save the work item, etc would fail. In VSTS users need to be mapped to their email address (i.e. jdoe@company.com) and need to have a display name as well.

Under the work item settings is the users section where the mapping is defined.

"users": [
        "source": "company\jdoe",
        "target": "John Doe (jdoe@company.com)"

The handler looks up the user by their source name and changes the field’s value to match the target. This will allow the identity to be resolved. If the mapping fails then the source value is used instead.

In hindsight identity mapping is useful elsewhere and probably should have been its own processor. But since this was the only place it was needed it was added as part of work item migration. Additionally the process was completely influenced by how our TFS and VSTS systems were set up. This may not work at all for other companies. Yet another issue is that for a large team this can take a while to configure. Given more time it would be better to define a separate processor that either mapped each source user to the target user by looking up values in both systems or used heuristics to map users.

Attachments and Images

Attachments and images introduce an issue when it comes to migration. So much so that the migration tool does not attempt to migrate either. Images will likely continue to work until the source TFS system is brought offline. Attachments are simply dropped. If your team uses a lot of either of these then you will need to do some extra work.

Images are stored in the work item as simply an anchor element. The URL points to an image handler URL defined in TFS with a unique identifier. The processor would need to do the following.

For each field that could have an image (i.e. Description) search for all anchors
For each anchor determine if it is using the TFS image handler
Ensure the work item is created
Download the image from TFS
Attach the image to the work item as an attachment and save to get the unique ID
Update the URL to point to VSTS’s image handler with the given ID
Save the work item
Remove the attachment

While this may work when updating work items it would not work for new work items that have images because the work item has to exist before an attachment can be added to it. The existing APIs do not support doing this. The UI doesn’t use the API for this so it is able to work around the limitation. Perhaps this will be fixed later or perhaps you don’t care too much about it. If you really want the iamges then you could define a custom field handler to handle the logic.

Attachments have a similar issue when it comes to creating the work item first. Again, if you are fine with this then the attachments could be added after the work item is migrated with a little extra effort.

Migrating Related Items

Links in VSTS are called relations. Most relations are bidirectional as defined here. This makes it easier to migrate links because the order of the migration no longer matters. Given either side of the link you can still relate the items in any order.

Migrating a work item that has relations requires that the links potentially be migrated as well. For our purposes we identified a couple of common scenarios.

  • Open items should have all open children migrated
  • Closed items may or may not want their children migrated
  • Open items may or may not want their closed children migrated
  • Related items may or may not be migrated

To help control this the following settings are defined.

"includeRelatedLinksOnClosed": false,
"includeChildLinksOnClosed": false,
"includeParentLinksOnClosed": false

Basically all open items should be migrated. If a work item is closed then related, parent and child links should not be migrated. If there are any exceptions to this rule then they can always be added to a migration query. An exception does exist in the processing for migrated items. If a work item has a link to another work item and that work item was migrated then the relation is created irrelevant of other settings.

The MigrateWorkItemRelationsAsync method handles the migration of relations. Unfortunately the links are not rebuilt in the historically correct order. That would potentially migrate work items that are not needed and would greatly complicate the process.

For each relation on the work item
If the relation is not a child, parent or related then skip it
Get the related work item from the list of migrated items so far
If this is a child relation (we are migrating the parent)
If the child has already been migrated
Create the link between the parent and child
Else if this item is active or the includeChildLinksOnClosed is true
Add the child to the list of items to migrate
Else If this is a parent relation (we are migrating the child)
If the parent has already been migrated
Create the link between the parent and child
Else if this item is active or the includeParentLinksOnClosed is true
Add the parent to the list of items to migrate
Else If this is a related relation (we are migrating one side or the other)
If the related item has already been migrated
Create the link between the two items
Else if this item is active or the includeRelatedLinksOnClosed is true
Add the other item to the list of items to migrate

Effectively this method is simply linking the relations that already exist. If the other side is missing but should be included then it is added to the migration list. If it is already there then it will be skipped later.

Breadcrumb Back to Original Item

As mentioned earlier, migrated work items will have a different ID from the source. It is generally useful to be able to identify work items that have been migrated and to get back to the original version. The processor tracks both the source and target IDs so the best solution is to create a custom field in your process to store the legacy ID (i.e. LegacyID). The process customization would need to occur before the work items are migrated. Then add a RenameFieldHandler for System.Id. If you try to set the ID on the work item then VSTS will simply ignore it so specify the target name to match your custom field. You can then update the work item UI to show the legacy ID.

"fields": [
        "source": "System.Id",
        "targetName": "MyProcess.Company_LegacyId"

The AddMigrationCommentAsync method is responsible for defining a breadcrumb back to the source item. If the migrationTag setting has a value then a tag is added to each work item. This is useful for quickly identifying migrated items. Additionally the method adds a comment to the history of the work item that indicates that the work item was migrated along with the old ID and URL. This is saved to the work item as a comment.

Since the migration process was winding down at the time this code was written it is not as flexible as it should be. Ideally the message should be configurable. Additionally there is a bug in the URL being generated. The URL to the source is the URL for the API call and not the UI.

Final Thoughts

As you can see, migrating work items is quite a bit of work. But we’ve been happy with the results. We were able to move all the work items we cared about along with any related work items that we might need. The work item history came across mostly unchanged so we could go back and identify why items were being changed, when and by who. While there are some gaps in the process this has been successful for us.

Download the code on GitHub