Bitcoin Core Contributor Challenges
If you find WORDS helpful, Bitcoin donations are unnecessary but appreciated. Our goal is to spread and preserve Bitcoin writings for future generations. Read more. | Make a Donation |
Bitcoin Core Contributor Challenges
By Jameson Lopp
Posted March 14, 2020
Bitcoin Core is an open source project that is the schelling point for development of the Bitcoin protocol; it is often referred to as the āreference implementationā because it is by far the most mature Bitcoin software with more contributors and activity than any other implementation by far. While it has changed names and even platforms several times, it is the original organization started by Satoshi Nakamoto.
But Bitcoin Core isnāt like most open source projects. Itās mission critical software than is relied upon by many individuals and enterprises in what has grown to be a $100+ billion network. Bitcoin Core is not developed in the same manner as your average consumer software; itās more akin to aerospace engineering. The stakes are high and tolerance for even the most minor failure is incredibly low due to the potential for catastrophe. If you speak to Core contributors who have been participating for many years, they tend to agree that the quality control of code review and testing has increased enormously over the past decade. As such, the bar has been raised for anyone who wishes to have their code merged into the repository.
If you wish to better understand the dynamics of how Core operates as an organization, check outĀ Who Controls Bitcoin Core? I also have several guides about contributing to Bitcoin CoreĀ linked in my educational resources.
Pull Request Stats
Note that in late 2011 the Bitcoin project migrated from SourceForge to Github - for simplicity Iām ignoring any rejected PRs that occurred during the SourceForge era. Itās not clear to me if itās even possible to find them. At time of writing, Bitcoin Core has:
- 337Ā open PRs
- 8,431Ā closed merged PRs
- 4,014 closedĀ unmerged PRs
Thus 32% of pull requests end up being abandoned / rejected (or re-proposed differently.)
Merged Pull Request Stats
Thankfully itās quite simple to query git to aggregate stats for the code that made it into the repository.
$ git log āno-merges āall āpretty=tformat: ānumstat awk ā{inserted+=$1; deleted+=$2; delta+=$1-$2; ratio=deleted/inserted} END {printf āCommit stats:\n- Lines added (total): Ā %s\n- Lines deleted (total): Ā %s\n- Total lines (delta): Ā %s\n- Add./Del. ratio (1:n): Ā 1 : %s\nā, inserted, deleted, delta, ratio }ā -
Based upon all historical commits (excluding merge commits):
- Lines added (total): Ā 2,167,565
- Lines deleted (total): Ā 1,483,481
- Total lines (delta): Ā 684,084
- Added / Deleted ratio (1:n): Ā 1 : 0.6844
Unmerged Pull Request Stats
But now we need to figure out how many PRs have been closed without being merged!
Weāve already determined that there are a little over 4,000 unmerged PRs at time of writing. But how many lines of code would these PRs have changed if merged? GitHubĀ has an API for that, though perhaps thereās a library that can help us use it more easily! Letās giveĀ PyGithubĀ a shotā¦
After a bit of trial and error, hereās what I came up with:Ā https://gist.github.com/jlopp/5aa87ed33e97ad58f54ace65e9b0ece3
Unfortunately, while Githubās UI allows us to filter out merged pull requests, it appears the API does not. So we need to iterate over all 12,000+ PRs to count the lines of code from the unmerged ones. It turns out that Github limits API calls to 5,000 per hour, so this operation requires us to throttle the script to spread itself out across 2+ hours.
The Results
After iterating all rejected pull requests from Bitcoin Core we find that there were:
- 9,011,209 total rejected added lines of code
- 6,279,435 total rejected deleted lines of code
Thatās 15,290,644 rejected lines of code changed vs 3,651,046 accepted!
Which means that as of time of writing, only 19% of proposed changed lines of code have been accepted into Bitcoin Core.
Top Contributor Stats
What if we drill down a bit more to the individual level? Clearly some contributors are better than others at navigating the (often arduous) process of seeing a proposal through to completion.
- Wladimir van der Laan - 88% PR merge rate 737 merged PRs 104 unmerged PRs
- Pieter Wuille - 87% PR merge rate 600 merged PRs 90 unmerged PRs
- Marco Falke - 85% PR merge rate 733 merged PRs 133 unmerged PRs
- Matt Corallo - 77% PR merge rate 290 merged PRs 88 unmerged PRs
What about notable contributors who have stopped contributing after scaling contention?
Jeff Garzik - 58% PR merge rate 88 merged 63 unmerged Mike Hearn - 57% PR merge rate 8 merged 6 unmerged Gavin Andresen - 80% PR merge rate 180 merged 43 unmerged
Gavinās stats are high but I wondered if that was due to his early involvement and if there was a noticeable trend leading toward his departure. It turns out there is indeed:
2012: 91% PR merge rate (49 out of 54 PRs) 2013: 86% PR merge rate (60 out of 70 PRs) 2014: 81% PR merge rate (29 out of 36 PRs) 2015: 59% PR merge rate (10 out of 17 PRs) 2016: 0% PR merge rate Ā (0 out of 5 PRs)
Did Gavin become a worse programmer over the years? That seems pretty unlikely. Rather, I suspect that this is evidence of Bitcoin Coreās quality standards increasing in rigor.
Takeaways
- Bitcoin Core has high standards; a significant portion of code changes are abandoned or rejected.
- Thereās evidence that supports the theory that code standards have increased over the life of the project.
- It appears that a PR rejection rate of under 70% is a sign that a contributor will get frustrated and stop contributing.
Thereās certainly opportunity to dig further into this phenomenon, but I think this is a good start!