By now, you may have heard of the hacker who says she scraped 99 percent of posts from Parler, the Twitter-wannabe site used by Trump supporters to help organize last Wednesday’s violent insurrection on Capitol Hill. What you may not know yet is the abysmal coding and security that made the scraping so easy.
To recap, the scraping was pulled off by a hacker who goes by the handle donk_enby. She originally set out to archive content posted to Parler last Wednesday in hopes of preserving self-incriminating material before account holders came to their senses and deleted it. By Sunday, donk_enby said she had collected roughly 80 terabytes of posts, including more than 1 million videos, many of which contained the GPS metadata identifying the exact locations of where the videos were shot.
“For the journalists DMing me to ask, in non-technical terms, I’d describe the current Parler archival situation as ‘a bunch of people running into a burning building trying to grab as many things as we can,’” donk_enby wrote on Twitter on Sunday. “Things will be available in a more accessible form later.”
The reason for urgency: Amazon, Apple, and Google all informed Parler that its lack of content moderation violated their terms of service. The archivists wanted to obtain the posts while the site remained online. But as it turned out, donk_enby was able to retrieve posts even after they had been deleted.
A key reason for her success: Parler’s site was a mess. Its public API used no authentication. When users deleted their posts, the site failed to remove the content and instead only added a delete flag to it. Oh, and each post carried a numerical ID that was incremented from the ID of the most recently published one.
The rookie code made it easy to automate the scraping, as this script—used by donk_enby’s archival team—demonstrates. As a result, massive numbers of posts that discussed the insurrection before, during, and after it was carried out will be preserved indefinitely so that they’re available to researchers, journalists, prosecutors, and others.
Another amateur mistake was Parler’s failure to scrub geolocations from images and videos posted online. Sites like Twitter and Google routinely remove such metadata from content posted by their users. The video files hosted on Parler, by contrast, were “raw,” meaning they still contained this information.
Parler’s moderation policies—even more lax than those of Twitter, Facebook, and Youtube—already made the site popular with far-right users looking for a forum to discuss debunked conspiracy theories. With Twitter permanently banning Trump, the president’s supporters embraced the site even more enthusiastically.
Prosecutors are already pursuing more than 150 suspects in Wednesday’s riot. The preservation of some 80TB of Parler posts, including more than 1 million raw video files, may result in more people being charged.