Blog articles

The Definitive Guide to Forks and Branches in Git

February 21, 2017
fork and branch crisscrossed

 

Open source patterns don’t always make the most sense for business.

Their goals are a bit different: open source favors experimentation, where business tends to want focus in order to reduce time to market.

Structural choices tend to mirror these goals. And the decision to use a forks or branches to work on features can have an appreciable impact a team’s overall productivity.

Forking tends to empower a divergent evolution of the codebase, which makes it ideal for an environment that’s looking to empower broad experimentation on a theme.

Branching, on the other hand, caters toward convergent evolution. This tends to be a better fit for private repositories, and why it is commonly used by enterprise companies and in most business contexts.

 

Divergent development vs. Convergent development

In the open source world, it’s very common for a codebase, once reaching a certain point, to split into two distinct projects, each with a different goal (though still a shared ancestry).

The most canonical example is the Linux codebase. Today, Linux has many forks (i.e RedHat), all stemming from a shared historical baseline.

What’s important to note is that these flavors aren’t just temporary development paths — they are codebases with distinct ‘identities,’ with no intention of re-integration with one another.

Branches, on the other hand, are always intended to be ‘convergent’ development paths. Branches are ephemeral by their very nature: a branch is really nothing more than a pointer at the head of a commit lineage. Both this pointer and the branch are eventually destroyed and purged from the Git history after the branch is merged into origin/master.

 

Forking creates an entirely new repository

GitHub popularized “forking” with a convenient button. When you make a fork, you are duplicating the entire repository and its history up until that point in time.

Forking creates a new repository

Clicking the fork button in GitHub, or any other host that allows for forking, completes a git clone command and creates a new origin/master.

Forks are best used: when the intent of the ‘split’ is to create a logically independent project, which may never reunite with its parent.

In the open source world, this happens all the time. A team sees a codebase that could be a good starting point for their project, and they have no intention of trying to merge this back into the root codebase. So they use a fork as a starting point.

 

Branching

Branches are more commonly used to act like ‘construction zones’ in a codebase.

Branches are best used: when they are created as temporary places to work through a feature, with the intent to merge the branch with the origin.

Most branches are short lived; once a feature is merged into origin/master the branch is deleted. Some branches are long-lived, i.e. Staging, still have the intent to converge with origin/master.

To create a branch, use git checkout -b new-branch which creates a new branch from the origin of your current repository. Changes can still occur onwhile you are working on your new-branch .

Branching

 

Associated costs

When merging a branch, git only has to run a diff on the work that was changed.

Forking is more expensive. When merging a fork, git effectively has to diff both entire codebase against one another, as a fork represents two full copies of the codebase.

Forking creates a full copy of your repository, whereas branching only adds a branch to your exiting tree. The file size of branch can vary depending on the branch that you are on.

Under the hood git readily accesses the different files and commits depending on what branch you are using. Forks will inherently take up more space on your server.

 

Forking is also operationally more expensive

Less visibility: With fork-centric workflows, developers each have their own completely independent repository. This makes it difficult to see what everyone is working on unless you can see everyones fork in one place.

If you’re using forks, these changes would live in different repos. This means that there is not true “collaboration space” for the team, just a canonical repo that everyone submits changes to when they are ready to make changes in master.

With a branch-centric flow, all commits exist on one repository. If everyone is pushing to their branch, you have access to all changes happening within your code base.

Increased Operational Risk: With a branch-centric workflow, developers can push their changes to a common repo frequently (say, at the end of every day). When other developers update their working copies, they will receive these branches.

A unique danger of a fork-centric workflow, is if a developer works in isolation on their own repo. Although not always the case, it presents a slightly increased risk profile for certain situations, like if the developer leaves the company or is temporarily unavailable. The risk here is knowledge management (i.e. other developers won’t necessarily know where the work is happening).

TL;DR

A branch-centric workflow makes sense for most business settings. Forks can be a really good pattern for ‘public’ collaboration and experimentation, but when the intended use case is many people working toward a unified goal, branching tends to be a better fit.