Migrating from TFS to VSTS, Part 2
The last article in this series discussed how to use the migration tool to move from TFS to VSTS. In this article we will discuss the code, why things work the way they do, suggestions for improvements and areas that can be modified to customize the tool for your specific needs.
Part 1 – TfsMigration Tool
Part 2 – Migrating from TFVC to Git
Part 3 – Migrating Areas and Iterations
Host Process
The host process is responsible for setting up the dependencies, configuring logging, parsing the command line and running the processors. It starts by initializing the logging infrastructure so everything is logged. Logging is written to both the console and to a log file specified by the command line. Additional logging targets can be created if needed. The tool uses NLog. A custom logger is created to track the total number of errors and warnings generated for reporting purposes.
After the logging is initialized then the command line is parsed. If any errors occur then the process aborts. The command line settings will be used to override any corresponding settings later when the settings are loaded from the settings file.
Next the settings are loaded. The settings file is a standard JSON file. The reader is simple in that it simply maps JSON property names to the corresponding properties on a settings type defined by the caller. Case does not matter. Each processor stores its settings in its own JSON object and global settings are stored at the top. After the settings are read they may be altered by command line arguments.
Once the settings have been loaded the host then creates the processor that should run and then runs it. After the processor finishes the host prints out some final information and terminates. Processors get most of their information from the IProcessorHost object passed to them during initialization. The ConsoleProcessRunner class is responsible for setting up the host object, wiring up cancellation support and running the processor. Each processor is wrapped with a stopwatch to help measure the time it takes to do each migration.
Core Logic
The shared logic needed by all processors resides in TfsMigrate.Core. The various interfaces needed by the processors reside here. As more processors were added to the tool there was a lot of code that was needed across processors. This code was moved into the core. So the core contains some logic for working with work items and other objects even though the migration tool tries to remain agnostic.
Each processor basically works the same, it initializes by loading its custom settings from the settings file. Then it creates a migration context to track the migration work. A migration context is just an object that contains shared data such as the source and target servers and any additional information needed for the migration. Finally the processor determines what to migrate and then loops through to migrate each item of interest.
One common task that all processors do is track the objects that have been migrated. Base support for this is in the MigratedObject class and interface. Generally a processor starts with a list of items to be migrated. The processor enumerates through the items one by one. At the end of the loop the item is either migrated, skipped or it failed. The IMigratedObject interface provides the processor with access to this information and allows the processors to basically handle migration logic the same. Each processor will generally create a derived type that contains additional information for a migrated object.
The TFS REST API is ultimately just a series of types deriving from the base HttpClient type. Other than state, each could have easily been implemented as an extension method. To simplify the usage of the API each processor adds some additional extension methods off the client. The core implementation is here along with some standardized error handling. The clients tend to throw exceptions even when it is not an exceptional case so some of the logic is in the core to make it easier to work with the clients.
Migrating from TFVC to Git
In the original article the migration process was discussed using the tip migration approach. Refer to the original article for the details. The TfsMigrate.Processors.VersionControl processor is responsible for this. Here’s a refresh of the algorithm used.
- Create a Git repo in VSTS for the project
- If the project has branches
- Find the baseline branch – this will either be the latest release branch or the baseline branch if no releases exist yet
- Download the baseline branch
- Clean up the folder (see below)
- Commit the repository as the baseline version
- If there was a release branch
- Create a release branch from the baseline in the repo using the version of the release
- Switch back to master
- Check out the master branch
- Download the latest version of the code – the development branch if the project supports branches
- Clean up the folder
- Copy the template files
- Set up the metadata file (if any)
- Commit the changes as the latest version
Note: This processor was the first to be written and therefore does not follow a lot of the process flow that later processors do. One of the biggest differences is this processor supports multiple team projects. In hindsight it should have just worked with one.
The processor starts by setting up a connection to the TFVC source server and Git destination server. It also configures the Git command line. Because the processor needs to store file temporarily it configures a temp directory for storing files. The processor then enumerates the projects to migrated defined in the settings.
The processor creates a new Git repo for the target project. If the repo already exists then it is deleted so that the branches are set up properly. If the project supports branches then the processor needs to migrate the baseline branch first.
Migrating the Baseline Branch
The processor starts by looking for the branch to migrate. Because, at my company, we have a branch for each release it starts by looking for a release branch. Each branch is assumed to be of the form v1.2.3.4 so if it finds multiple branches then it will convert them to a Version and use the latest one. This becomes the version of the code that will be used later. If no release branches can be found then it falls back to the baseline branch defined in the settings file. If neither branch exists then it skips the branching logic altogether and moves on to migrating the dev branch as discussed later.
The baseline branch is downloaded to a temp folder. The REST API is used for this since we want all the files. Then the folder is cleaned. The cleaning process, configured by the settings, removes the folders and files that won’t be needed in Git. These generally include files like .tfignore and .vspscc but may also include .gitignore and .gitattributes so they can be reset in the new system. Folders can be included and will often include packages and .nuget. If a project was using automatic restore with TFVC then a nuget.config file was necessary to prevent the packages from being checked in. This isn’t needed in Git and can be removed.
After the folder is cleaned up it is checked to see if it is empty. Git only understands files and it throws an error if you try to commit changes with no files. If there are no files then this project is skipped.
The local files are then committed to the master branch (configurable) of the repo created earlier. This is where things are different. Rather than using the REST API we call out to Git on the command line. The REST API can be used to commit a repo but it has a limitation. When we commit changes we really only want to commit the adds, updates and deletes that occur. The REST API does not track that directly and instead requires that you specify the files for each one. To get that information we would need to enumerate the files and determine the state of each one compared to the last commit we made, if any. This is exactly what Git already does so, while it takes a little extra code, it is easier to simply use the command line instead.
After the commit of the baseline branch, if the branch was a versioned release, then a new branch is created from master with the version from TFVC. This provides a snapshot of the last release that was in TFVC. To do this the processor builds the branch name (configurable) from the version, checks out the local repo to the new branch and then immediately commits the changes to the new branch. Lastly the processor checks out the master branch again so we are back to the state we were in before.
Migrating the Dev Branch
Irrelevant of whether a project had branches or not the processor now migrates the dev branch which is assumed to be the actively modified code. For the most part it follows the exact same process as mentioned earlier for the baseline branch.
The processor starts by wiping the existing repo directory (except the .git folder) so Git can identify removed files. Then the dev branch is downloaded from TFS. The folder is then cleaned as before. The new step here is to copy in any files contained in the Template folder. This allows the addition of files to every repo. Example files may include .gitignore and .gitattributes so every repo has one.
One of the files that should be included in the templates is a breadcrumb file, generally readme.md but configurable.The breadcrumb file can contain anything you want but should contain some basic information to get back to the corresponding TFVC folder if needed. The processor will do simple text substitution to replace some values (delimited by {}) in the file with the calculated values at runtime. The following “variables” are supported.
- Date – The date and time of the migration
- TfsCollectionUrl – The URL to the source TFS collection
- SourcePath – The source path in TFVC
- DestinationPath – The destination path in Git
- BaselinePath – The baseline branch that was used
- DevelopmentPath – The development branch that was used
- Version – The version that was base lined, if any
Finally the files are committed to the master branch. Since we are using the Git command line only changed files are committed. This makes it easy to look in the repo and see what files have and haven’t changed between your baseline and dev branches (in TFVC).
Lastly the local files are cleaned up, if configured, in preparation for the next project to migrate.
Improvements
Hindsight is 20/20 so there are some things that might be nice to add to the processor.
At least at our company we lock release branches so they cannot be changed. It would be nice if this was an option so it does not have to be done manually later.
Allow the configuration of a branch policy on each master branch. When we did our migration we hadn’t decided whether we would use PRs or continue checking into master as normal. Now that we are using PRs we have set up branch policies. Having to go back to each repo to do this is painful. The fact that VSTS does not support configuring a global branch policy, or even import/export, makes this a nightmare given the number of repos we have.
Allow the merging of folders into a single repo. This could have been useful for some of our TFVC items. We will have to do that manually now but it might have been nice to be able to do this during the migration. Of course it can be done before the migration but it does not seem like it would be that much harder to do during the migration provided some rules were set up for handling branches.
Download the code on GitHub.
Comments