Bitcoin Core Contributor Challenges

4 minute read

If you find WORDS helpful, Bitcoin donations are unnecessary but appreciated. Our goal is to spread and preserve Bitcoin writings for future generations. Read more. Make a Donation

Bitcoin Core Contributor Challenges

By Jameson Lopp

Posted March 14, 2020

Bitcoin Core Contributor Challenges

Bitcoin Core is an open source project that is the schelling point for development of the Bitcoin protocol; it is often referred to as the ā€œreference implementationā€ because it is by far the most mature Bitcoin software with more contributors and activity than any other implementation by far. While it has changed names and even platforms several times, it is the original organization started by Satoshi Nakamoto.

But Bitcoin Core isnā€™t like most open source projects. Itā€™s mission critical software than is relied upon by many individuals and enterprises in what has grown to be a $100+ billion network. Bitcoin Core is not developed in the same manner as your average consumer software; itā€™s more akin to aerospace engineering. The stakes are high and tolerance for even the most minor failure is incredibly low due to the potential for catastrophe. If you speak to Core contributors who have been participating for many years, they tend to agree that the quality control of code review and testing has increased enormously over the past decade. As such, the bar has been raised for anyone who wishes to have their code merged into the repository.

If you wish to better understand the dynamics of how Core operates as an organization, check outĀ Who Controls Bitcoin Core? I also have several guides about contributing to Bitcoin CoreĀ linked in my educational resources.

Pull Request Stats

Note that in late 2011 the Bitcoin project migrated from SourceForge to Github - for simplicity Iā€™m ignoring any rejected PRs that occurred during the SourceForge era. Itā€™s not clear to me if itā€™s even possible to find them. At time of writing, Bitcoin Core has:

Thus 32% of pull requests end up being abandoned / rejected (or re-proposed differently.)

Merged Pull Request Stats

Thankfully itā€™s quite simple to query git to aggregate stats for the code that made it into the repository.

$ git log ā€“no-merges ā€“all ā€“pretty=tformat: ā€“numstat awk ā€˜{inserted+=$1; deleted+=$2; delta+=$1-$2; ratio=deleted/inserted} END {printf ā€œCommit stats:\n- Lines added (total): Ā %s\n- Lines deleted (total): Ā %s\n- Total lines (delta): Ā %s\n- Add./Del. ratio (1:n): Ā 1 : %s\nā€, inserted, deleted, delta, ratio }ā€™ -

Based upon all historical commits (excluding merge commits):

  • Lines added (total): Ā 2,167,565
  • Lines deleted (total): Ā 1,483,481
  • Total lines (delta): Ā 684,084
  • Added / Deleted ratio (1:n): Ā 1 : 0.6844

Unmerged Pull Request Stats

But now we need to figure out how many PRs have been closed without being merged!

Weā€™ve already determined that there are a little over 4,000 unmerged PRs at time of writing. But how many lines of code would these PRs have changed if merged? GitHubĀ has an API for that, though perhaps thereā€™s a library that can help us use it more easily! Letā€™s giveĀ PyGithubĀ a shotā€¦

After a bit of trial and error, hereā€™s what I came up with:Ā https://gist.github.com/jlopp/5aa87ed33e97ad58f54ace65e9b0ece3

Unfortunately, while Githubā€™s UI allows us to filter out merged pull requests, it appears the API does not. So we need to iterate over all 12,000+ PRs to count the lines of code from the unmerged ones. It turns out that Github limits API calls to 5,000 per hour, so this operation requires us to throttle the script to spread itself out across 2+ hours.

The Results

After iterating all rejected pull requests from Bitcoin Core we find that there were:

  • 9,011,209 total rejected added lines of code
  • 6,279,435 total rejected deleted lines of code

Thatā€™s 15,290,644 rejected lines of code changed vs 3,651,046 accepted!

Which means that as of time of writing, only 19% of proposed changed lines of code have been accepted into Bitcoin Core.

Top Contributor Stats

What if we drill down a bit more to the individual level? Clearly some contributors are better than others at navigating the (often arduous) process of seeing a proposal through to completion.

  • Wladimir van der Laan - 88% PR merge rate 737 merged PRs 104 unmerged PRs
  • Pieter Wuille - 87% PR merge rate 600 merged PRs 90 unmerged PRs
  • Marco Falke - 85% PR merge rate 733 merged PRs 133 unmerged PRs
  • Matt Corallo - 77% PR merge rate 290 merged PRs 88 unmerged PRs

What about notable contributors who have stopped contributing after scaling contention?

Jeff Garzik - 58% PR merge rate 88 merged 63 unmerged Mike Hearn - 57% PR merge rate 8 merged 6 unmerged Gavin Andresen - 80% PR merge rate 180 merged 43 unmerged

Gavinā€™s stats are high but I wondered if that was due to his early involvement and if there was a noticeable trend leading toward his departure. It turns out there is indeed:

2012: 91% PR merge rate (49 out of 54 PRs) 2013: 86% PR merge rate (60 out of 70 PRs) 2014: 81% PR merge rate (29 out of 36 PRs) 2015: 59% PR merge rate (10 out of 17 PRs) 2016: 0% PR merge rate Ā (0 out of 5 PRs)

Did Gavin become a worse programmer over the years? That seems pretty unlikely. Rather, I suspect that this is evidence of Bitcoin Coreā€™s quality standards increasing in rigor.

Takeaways

  • Bitcoin Core has high standards; a significant portion of code changes are abandoned or rejected.
  • Thereā€™s evidence that supports the theory that code standards have increased over the life of the project.
  • It appears that a PR rejection rate of under 70% is a sign that a contributor will get frustrated and stop contributing.

Thereā€™s certainly opportunity to dig further into this phenomenon, but I think this is a good start!


Subscribe to WORDS

* indicates required