Microsoft’s switch to using Git as the version control system for Windows’ development has resulted in many challenges. Git wasn’t really built for a 300GB repository with 3.5 million files, and the engineering effort to make Git scale in this way continues.
But in adopting and developing what the company is calling One Engineering System (1ES), the Windows and Devices Group (WDG) has adopted more than just Git; the group has also adopted Visual Studio Team Services (VSTS), the company’s source control, item tracking, integration and testing system, and with VSTS a more integrated, devops style approach to developing. Git is an important part of this but far from the whole story. Microsoft wrote today about some of its experiences using VSTS, including some of the problems the scale of the operation has caused.
The adoption of VSTS features and devops practices isn’t uniform across WDG. Continuous integration and continuous delivery make sense for some parts of WDG—online services are an obvious example, and even some of the apps in the Microsoft Store could qualify—but they’re less applicable to the core Windows operating system itself. Nonetheless, the company has worked to standardize practices that are common to every component.
The Fall Creators Update (version 1709) is a good demonstration of just how big an operation Windows is. That update included some 4 million commits, grouped into half a million pull requests to get those changes incorporated into the main Windows code. Each pull request—a group of changes batched together representing a new feature, a bug fix, or similar—is a request to merge some changes into the main branch, with those changes merged into the most recent version of the main branch. If two pull requests are accepted at the same time, they’ll both try to merge into the same current version, so one will succeed and the other will fail; it will have to be retried against the new current version that includes the accepted request. Done naively, this creates lots of wasted work for each pull request.
On typical projects with normal numbers of developers, the number of pull requests is usually low enough that nobody ever attempts to merge two requests simultaneously, so this scenario almost never happens. Indeed, if there’s only one person accepting pull requests—a common situation on small open-source projects—this will never happen. But for Windows, the vast number of pull requests and changes means that the main branch is almost always being updated, making it much more likely that two pull requests will try to merge simultaneously. Pull requests will thus fail, even after they’ve been accepted. To handle this, Microsoft added a queuing system so that accepted pull requests are automatically performed sequentially; they don’t race against each other, and the system can cut out the wasted work that a naive system would otherwise have to do.
This represents a recurring issue with 1ES: practices that are fine for smaller teams and products become unusable with the 7,000 developers and 4,000 designers, program managers, and service engineers working within WDG. As another example, regular VSTS uses a dropdown box of usernames for assigning work items to people. That system works well when a project has only a few developers, but Microsoft has a total of some 80,000 accounts in VSTS, far too many to be listed in a single dropdown.
And the company has a lot of work items. Microsoft’s practice is to use work items for everything; bugs and new features, for example, are all work items. Historically, the company gave broad internal access to bug tracking, but tracking of new features was much more opaque, visible only to the teams or divisions working on a particular feature. With 1ES, these things are recorded all within the VSTS system, with company-wide visibility and a total of some 10 million work items.
With this improved visibility, cross-division dependencies can be created so that, for example, a Visual Studio or Office feature can be set to depend on a Windows feature. Progress of these items can also be tracked to ensure that both ship at the right time. Prior to 1ES, the company had five different ways of tracking these dependencies.
The work going into 1ES hasn’t just involved building a common system for Windows development but also common processes and naming for those processes. Before, different teams might use the term “bug” or “issue” or “defect,” and when addressed, those bugs might be “complete” or “completed” or “closed,” with different workflows for handling them. In bringing together the different groups, the terminology and processes are being standardized, enabling better reporting and easier communication between streams.
Microsoft used WDG’s experience with Git to propose changes to the open source software and has been working over the past year to have these changes incorporated into Git itself. The same is true of the scaling work applied to VSTS. As an example, WDG wanted the ability to create archives of VSTS data; this feature was found to be generally useful and was released as an open source VSTS extension in 2016 and is used for both archival and data migration between VSTS accounts.