JEP 357: Migrate from Mercurial to Git

AuthorsErik Duveblad, Joe Darcy
OwnerJoe Darcy
TypeInfrastructure
ScopeJDK
StatusClosed / Delivered
Release16
Componentinfrastructure
Discussiondiscuss at openjdk dot java dot net
EffortL
DurationM
Relates toJEP 369: Migrate to GitHub
Reviewed byMark Reinhold
Endorsed byMark Reinhold
Created2019/07/12 02:20
Updated2021/01/27 21:56
Issue8227614

Summary

Migrate the OpenJDK Community's source code repositories from Mercurial (hg) to Git.

Goals

Non-Goals

Motivation

There are three primary reasons for migrating to Git:

  1. Size of version control system metadata
  2. Available tooling
  3. Available hosting

Initial prototypes of converted repositories show a significant reduction in the size of the version control metadata. For example, the .git directory of the jdk/jdk repository is approximately 300 MB with Git and the .hg directory is around 1.2 GB with Mercurial, depending on the Mercurial version being used. The reduction in metadata preserves local disk space and reduces clone times, since fewer bits have to go over the wire. Git also features shallow clones that only clone parts of the history, resulting in even less metadata for those users who do not need the entire history.

There are many more tools for interacting with Git than Mercurial:

Lastly, there are many options available for hosting Git repositories, whether self-hosted or hosted as a service.

Description

We have already prototyped a program to convert a Mercurial repository to a Git repository. It uses the git-fast-import protocol to import Mercurial changesets into Git, and it adjusts existing commit messages to align with Git best practices. A commit message for the Mercurial jdk/jdk repository has this structure:

JdkHgCommitMessage : BugIdLine+ SummaryLine? ReviewersLine ContributedByLine?

BugIdLine : /[0-9]{8}/ ": " Text

SummaryLine : "Summary: " Text

ReviewersLine : "Reviewed-by: " Username (", " Username)* "\n"

ContributedByLine : "Contributed-by: " Text

Username : /[a-z]+/

Text : /[^\n]+/ "\n"

A commit message for the Git jdk/jdk repository will have a somewhat different structure:

JdkGitCommitMessage : BugIdLine+ Body? Trailers

BugIdLine : /[0-9]{8}/ ": " Text

Body : BlankLine Text*

Trailers : BlankLine Co-authors? Reviewers

Co-authors : (BlankLine Co-author )+

Co-author : "Co-authored-by: " Real-name <Email>

Reviewers : "Reviewed-by: " Username (", " Username)* "\n"

BlankLine = "\n"

Username : /[a-z]+/

Text : /[^\n]+/ "\n"

The reasons to change the message structure are:

Git uses an additional field in the commit metadata to denote the committer of a commit, as distinct from the author. We'll use the committer field to signify sponsorship: in the event of a sponsored commit, the author will be named in the author field of the commit and the sponsor will be named in the committer field. If the sponsor also is a co-author, then an appropriate Co-authored-by trailer will be added, which is a situation that we cannot capture in the existing Mercurial message structure.

Another possible use of the distinct committer field is to identify backport commits from feature releases to update releases, which are typically done by someone other than the author of the original commit.

The content of the author and committer fields will be of the form Real-name <Email>, since not every commit author will be an OpenJDK Author. A special email address will be used to show that an author or committer is also an OpenJDK Author: <openjdk-username>@openjdk.org.

Here is an example of a Git commit message:

76543210: Crash when starting the JVM

Fixed a tricky race condition when the JVM is starting up by using a Mutex.

Co-authored-by: Robin Westberg <rwestberg@openjdk.org>
Reviewed-by: darcy

The author and committer field for such a commit would be:

Author: Erik Duveblad <ehelin@openjdk.org>
Commit: Erik Duveblad <ehelin@openjdk.org>

In the conversion process a commit will be considered as sponsored when the author is not listed on the Contributed-by: line. In that case the first person on the Contributed-by: line will be taken as the author, the sponsor will be taken as the committer, and any additional contributors will be taken as co-authors.

Examples of converted repositories are available at https://github.com/openjdk/.

Tools

We've already prototyped backward compatible ports of the Mercurial tools jcheck, webrev and defpath.

We've also prototyped new tool, git-translate. This tool uses a file called .hgcommits that is generated by the conversion tools and committed to the Git repositories. This file contains a sequence of lines, each of which contains two hexadecimal hashes: the first is the hash of a Mercurial changeset and the second is the hash of the Git commit resulting from converting that Mercurial changeset. The tool git-translate simply queries the file .hgcommits:

$ git translate --to-hg 0f8927e8b5bf88e7e2c7c453b4cd75e01eeccaf4
bd613b97c7c88658801b0f0c603a55345dfef022
$

Alternatives

Keep using Mercurial.

Testing

The .hgcommits mapping file, described above, can be used to verify that all the metadata for a given commit is correctly converted and also that the source code trees for the two commits are identical.

Risks and Assumptions

The outstanding risk is that errors are introduced as part of the conversion. That risk will be mitigated through rigorous verification as described above.

Dependencies