The Internet is less of a wild west than it once was, but there are still corners of it that are obscured from view and quite shady. For instance, Tor hidden services have included numerous criminal enterprises like The Silk Road. Two researchers from King’s College London set out to discover just how much of Tor was devoted to illegal content. The result? Most of it.
Tor (which originally stood for The Onion Router) is a network composed of layers of encrypted relays through which data is passed. Each node in the network only knows where a packet just was and where it’s going next. After a few hops, the source of a packet is (almost) impossible to discern. Most people use Tor to reach sites on the open Internet anonymously, but there are also sites that are hosted entirely within Tor, called hidden services. The Silk Road was a hidden service, but there are innumerable others. It is these sites Daniel Moore and Thomas Rid sought to quantify.
It’s no easy task to find all the hidden services on Tor, let alone get a look at the data they’re hosting. Hidden services are ephemeral, often switching addresses and server locations without notice. To top it off, Tor addresses are just long strings of characters with a .onion domain at the end. In order to get a proper sample of all the hidden services lurking out there, the pair built a Python script that crawled the dark web, starting with the popular Tor search engines Onion City and Ahmia.
The bot’s job was to scrape the content from each page and upload it for analysis. When the bot found a link to another hidden service (the main way you find things on the dark web), it would hop to that one and scrape it too. The pair used an algorithm to process all the content collected and sort it into categories like drugs, social, pornography, and financial. The sorting was spot checked and found to be very accurate overall.
After the script had run its course, 5,205 live websites were indexed; a total of 2,723 pages were classified by content. Pages with fewer than 50 words and those with no content were dropped in the “none” category. According to the analysis, 57% of the sites hosted illicit content like drugs and child pornography. The Tor project estimates there are about 35,000 total hidden services active, so this is far from a full survey, but enough to be a representative sample.
Moore and Rid say their goal with this project was to establish a more moderate perspective on the role of encryption. Politicians are currently demanding unworkable backdoors to encryption, but Moore and Rid say that privacy activists fail to fully acknowledge the potential for abuse. They don’t have a solution in mind — they’re just making the data available.
This entry passed through the Full-Text RSS service – if this is your content and you’re reading it on someone else’s site, please read the FAQ at fivefilters.org/content-only/faq.php#publishers.