Last night, developer and privacy activist Resynth1943 announced that GitHub’s source code had been leaked on GitHub itself, in GitHub’s own DMCA repository. It’s going to take some unpacking to talk about that, but first things first—this isn’t as big a deal as it might sound like.
GitHub Enterprise Server != GitHub.com
Shortly after Resynth1943—who seems to have broken the news and described the code as having “just been leaked” by an unknown individual—reshared the announcement on Hacker News, GitHub CEO Nat Friedman showed up at HN to provide some context.
According to Friedman, the upload in question was actually of GitHub Enterprise Server, not the GitHub website itself. While the two share a considerable volume of code, the distinction is significant. Part of that significance is that GitHub itself was not actually hacked.
While neither GitHub nor GitHub Enterprise Server are open source code, GitHub Enterprise Server source code is routinely shipped to customers, though usually in a stripped-down and obfuscated format. According to Friedman, GitHub accidentally supplied some customers a complete and non-obfuscated tarball of GHES a couple of months ago; this is the code which was dumped into GitHub’s public DMCA repository.
Grinding a DMCA-related axe
It seems likely that the “unknown individual” Resynth1943 referenced uploaded the leaked source code largely out of anger about the recent Youtube-dl takedown.
The code itself was dumped into GitHub’s DMCA repository, which serves as a history of DMCA takedown requests that GitHub has received, as it receives them, similar to the Chilling Effects notices you may have seen on Google searches over the years.
What is this?
Inspired by Lumen (formerly Chilling Effects) and Google, this repo contains the text of DMCA takedown notices and counter-notices we’ve received here at GitHub. We publish them as they are received, with only personally identifiable information redacted.
Resynth1943’s announcement simultaneously criticizes Microsoft as hypocritical for not deliberately opening up GitHub’s source while suggesting that perhaps it will be less secure now that its code has been leaked.
How do I shot fake commit?
The commit itself was flagged as apparently being made by user Nat—aka Nat Friedman, the current CEO of GitHub. Much like the content of the commit, this is misleading—Git itself, the source code versioning system underlying GitHub, does not protect significantly against user impersonation. The commit in question was not labeled “verified,” which means it was not signed with Friedman’s GPG key.
Git commits—much like email messages—allow users to put whatever information they please in the user.name and user.email fields. This makes spoofing that information trivial. Unless the commit is actually signed with a GPG key associated with that email address, there’s no real verification that it comes from where it says it should.
This leaves the problem of how a commit from some random user would show up in GitHub’s DMCA repository in the first place—but the answer there doesn’t involve any actual account compromises, either.
When you push a commit to a Git repository, you get a hash which represents that commit and can be used to locate it in the tree. GitHub—part of which is the Web application which provides in-browser access to that underlying Git structure—keeps all forks of a Git repository in a single underlying repository, although it doesn’t generally appear that way in the URL structure.
Use the forks, Luke
So, in order to create the illusion that GitHub CEO Nat Friedman made a commit to the GitHub DMCA repo, the unknown individual first needed to clone the DMCA repository. After forking the repository—creating a copy which they had privileges to make commits to—the next step was to commit the leaked source, spoofing Friedman’s name and email address in
This would result in a forked repository, with the bogus commit. But it still wouldn’t have looked quite right—the URL, after all, would still point to both the fork and to the attacker’s real GitHub username and account. But under the hood, both parent and fork are part of the same repository at the underlying Git level. This allowed the attacker to construct a URL which makes the commit appear to have been made to the main repository, not the fork.
To complete the deception, the attacker began with
https://github.com/github/dmca, then appended
tree/$hash to the end, where
$hash was the hash of the commit made to their own fork—and presto! The result was a URL which appeared to be a commit, made by CEO Nat Friedman, to GitHub’s own DMCA repository.
GitHub wasn’t “hacked”—but there’s a lot of room for improvement
On the plus side, there’s no actual compromise here. The source code was freely, if accidentally, given to customers—not exfiltrated from a compromised server. Similarly, Friedman didn’t lose control of his own account, and GitHub didn’t lose control of its DMCA repository. In Friedman’s own rather flippant words on Hacker News, “everything is fine, situation normal, the lark is on the wing, the snail is on the thorn, and all’s right with the world.”
Although all of the shenanigans documented here are within expectations—if you want to verify your identity, you should sign your commits with a GPG key—those expectations themselves are, perhaps, much lower than they should be. Managing GPG is still onerous enough to serve as a significant barrier to entry for many developers. More importantly, GitHub doesn’t offer any controls to emphasize the presence—or lack—of such signatures.
We’ve seen plenty of suggestions floating around for tooltips such as “this user typically signs their commits, and this commit is not signed” where appropriate. We also think it’s past time to fix the issue allowing an attacker to spoof what repository they’ve committed to using the fork-and-manual-URL-construction technique we described above.
Finally, it’s probably time to have a serious discussion about whether unsigned commits should be a default in the first place. We live in a world where even simple Web browsing is overwhelmingly expected to be conducted using authentication and encryption—which makes the kind of casual spoofing seen today all the more surprising, and disturbing.