Linear, rebase, and pull-request GitHub workflow
Problem
There are a variety of workflows that can be effectively used with both Git version control and GitHub. We should have a common and consistent workflow across as much of Carbon as possible. While some specific areas may end up needing specialized flows to suit their needs, these should be the exceptions and can be handled on a case-by-case basis when they arise.
Background
- Chapter 16 “Version Control and Branch Management” in the SWE book (Software Engineering at Google)
- Trunk Based Development
- GitHub documentation on “pull requests”
Proposal
Carbon repositories should all follow a common workflow for managing and landing changes. The document outlines the specifics of the proposed workflow as well as the core motivation.
Rename the default git branch to trunk
This achieves two goals:
-
Replaces the term
master
. This term, while only used in isolation in Git, was used in immediately preceding and related systems as part of extremely problematic “master/slave” terminology. That background associates the term with unacceptable historical and cultural meanings. The intent of those using or adopting the term isn’t relevant to this association. The less overtly problematic term being isolated from the rest doesn’t erase its history, and doesn’t completely avoid painful associations. -
It directly anchors and reinforces contributors on the trunk-based workflow.
Longer discussion of linear history
While Git can effectively bisect through complex history graphs, this is significantly harder than bisecting across linear history. Especially for any part of Carbon involving code, we expect bisection to be a pervasive tool and we can make it simpler and more effective by forcing a linear history.
A linear history also makes it much easier to ask questions about whether a particular change has landed yet, or when a bug was introduced. For some people, releases are more simply understood as branching from a specific snapshot of the linear history. While tools like git log
can provide similar functionality, it is less trivially understood.
Continuous integration is simplified for many of the same reasons as bisection: the set of potential deltas is reduced to a linear sequence. Reverting to green becomes easier to understand, and testing each incremental commit has a single obvious interpretation.
Requiring linear history also incentivises incremental development that is committed early to the main branch. This, in essence, ensures a single source of truth model even with the distributed version control system. Because works-in-progress are required to be rebased, they tend to merge early and often rather than forming long-lived development branches. This helps reduce the total merge resolution and testing costs as a project scales. For more details about the advantages of using a single source of truth, see the full text of the “Version Control and Branch Management” chapter in the SWE book.
One concern with linear history when rebasing a sequence of changes and merging them is that the pull request associated with that sequence might not be obvious from the main branch commit log. However, there is enough information in the repository to establish the relationship, and GitHub’s UI surfaces the pull request on each commit in the series.
Alternatives considered
LLVM model
LLVM allows directly pushing/submitting to their “trunk” with post-commit review. LLVM enforces linear history for day-to-day development. Merge commits are allowed for rare events like contributing a new subproject.
Advantages:
- Still has linear history.
- Incentivizes squashing for continuous integration and bisection.
- Very low overhead for fixing trivial mistakes.
Disadvantages:
- Creates extremely bad incentives around code review.
- Lots of patches don’t get pre-commit review, even if they would benefit from it.
- Very experienced contributors are much better at avoiding pre-commit review, so are rarely blocked waiting on review.
- Leads to the most experienced members of the community not doing enough code reviews, or being timely enough in code reviews.
- Lots of patches submitted with post-commit review are never reviewed in practice unless they break something.
- UI and basic support for code reviews entirely focused on pull requests.
Fork and merge model
Classically, Git and GitHub support merging entire complex graphs of development.
Advantages:
- Mostly supported by pull requests, so still able to use much of that functionality.
- Supports a model in which contributors do not communicate and can each develop a local, decentralized fork while still achieving overall reconciliation.
- Can model much more complex history of code evolution faithfully in the tooling.
- Most of the time these aren’t so complex that they create problems for humans.
Disadvantages:
- History is harder for humans to understand and reason about in complex cases.
- Bisection and continuous integration are more complex.
- May create difficulty for continuous integration against mainline, because unclear what “order” they should be applied / explored. While there are technical approaches to address this, they don’t seem to eliminate the complexity, merely provide a clear set of mechanics for handling it.
- Reduces incentives to land small, incremental changes by allowing non-linearity to reduce the effort required for large and/or complex merges.
- Makes review of the main branch’s history harder due to non-linearity.
Fork and merge, but branches can only have linear history
Imagine a fork and merge model, but PRs can only have linear history. That is, branching and merging within a PR, or merging trunk
into PR is not allowed. In this model, the only merge commits are merges of PRs into trunk
. PRs themselves don’t contain merge commits.
Advantages:
- Mostly supported by GitHub pull requests, so still able to use much of that functionality.
- Restricts non-linearity of history. The only non-linearity that is left is merge commits on the trunk branch. PRs themselves can’t contain merge commits.
Disadvantages:
- Requires a custom presubmit on GitHub that checks linearity of a PR.
- The disadvantages of the fork and merge strategy regarding the complexity of history remain, but to a smaller degree, since non-linearity is restricted.
Rationale
- Easy to follow single source of truth will help foster an open and inclusive community.
- Review requirements and focus on small, incremental changes match well established engineering practices for ensuring the project and codebase scale up both in size and time dimensions.
- Linear history seems easier for humans to reason about.