Linkrot

The internet is a library built on quicksand.

Cory Doctorow
6 min readMay 21, 2024
A 1994 Yahoo homepage. It is animated. The blue links blur and then disappear.

For the rest of May, my bestselling solarpunk utopian novel The Lost Cause (2023) is available as a $2.99, DRM-free ebook!

Here’s an underrated cognitive virtue: “object permanence” — that is, remembering how you perceived something previously. As Riley Quinn often reminds us, the left is the ideology of object permanence — to be a leftist is to hate and mistrust the CIA even when they’re tormenting Trump for a brief instant, or to remember that it was once possible for a working person to support their family with their wages:

https://pluralistic.net/2023/10/27/six-sells/#youre-holding-it-wrong

The thing is, object permanence is hard. Life comes at you quickly. It’s very hard to remember facts, and the order in which those facts arrived — it’s even harder to remember how you felt about those facts in the moment.

This is where blogging comes in — for me, at least. Back in 1997, Scott Edelman — editor of Science Fiction Age — asked me to take over the back page of the magazine by writing up ten links of interest for the nascent web. I wrote that column until the spring of 2000, then, in early 2001, Mark Frauenfelder asked me to guest-edit Boing Boing, whereupon the tempo of my web-logging went daily. I kept that up on Boing Boing for more than 19 years, writing about 54,000 posts. In February, 2020, I started Pluralistic.net, my solo project, a kind of blog/newsletter, and in the four-plus years since, I’ve written about 1,200 editions containing between one and twelve posts each.

This gigantic corpus of everything I ever considered to be noteworthy is immensely valuable to me. The act of taking notes in public is a powerful discipline: rather than jotting cryptic notes to myself in a commonplace book, I publish those notes for strangers. This imposes a rigor on the note-taking that makes those notes far more useful to me in years to come.

Better still: public note-taking is powerfully mnemonic. The things I’ve taken notes on form a kind of supersaturated solution of story ideas, essay ideas, speech ideas, and more, and periodically two or more of these fragments will glom together, nucleate, and a fully-formed work will crystallize out of the solution.

Then, the fact that all these fragments are also database entries — contained in the back-end of a WordPress installation that I can run complex queries on — comes into play, letting me swiftly and reliably confirm my memories of these long-gone phenomena. Inevitably, these queries turn up material that I’ve totally forgotten, and these make the result even richer, like adding homemade stock to a stew to bring out a rich and complicated flavor. Better still, many of these posts have been annotated by readers with supplemental materials or vigorous objections.

I call this all “The Memex Method” and it lets me write a lot (I wrote nine books during lockdown, as I used work to distract me from anxiety — something I stumbled into through a lifetime of chronic pain management):

https://pluralistic.net/2021/05/09/the-memex-method/

Back in 2013, I started a new daily Boing Boing feature: “This Day In Blogging History,” wherein I would look at the archive of posts for that day one, five and ten years previously:

https://boingboing.net/2013/06/24/this-day-in-blogging-history.html

With Pluralistic, I turned this into a daily newsletter feature, now stretching back to twenty, fifteen, ten, five and one year ago. Here’s today’s:

https://pluralistic.net/2024/05/21/noway-back-machine/#retro

This is a tremendous adjunct to the Memex Method. It’s a structured way to review everything I’ve ever thought about, in five-year increments, every single day. I liken this to working dough, where there’s stuff at the edges getting dried out and crumbly, and so your fold it all back into the middle. All these old fragments naturally slip out of your thoughts and understanding, but you can revive their centrality by briefly paying attention to them for a few minutes every day.

This structured daily review is a wonderful way to maintain object permanence, reviewing your attitudes and beliefs over time. It’s also a way to understand the long-forgotten origins of issues that are central to you today. Yesterday, I was reminded that I started thinking about automotive Right to Repair 15 years ago:

https://www.eff.org/deeplinks/2009/05/right-repair-law-pro

Given that we’re still fighting over this, that’s some important perspective, a reminder of the likely timescales involved in more recent issues where I feel like little progress is being made.

Remember when we all got pissed off because the mustache-twirling evil CEO of Warners, David Zaslav, was shredding highly anticipated TV shows and movies prior to their release to get a tax-credit? Turns out that we started getting angry about this stuff twenty years ago, when Michael Eisner did it to Michael Moore’s “Fahrenheit 911”:

https://www.nytimes.com/2004/05/05/us/disney-is-blocking-distribution-of-film-that-criticizes-bush.html

It’s not just object permanence: this daily spelunk through my old records is also a way to continuously and methodically sound the web for linkrot: when old links go bad. Over the past five years, I’ve noticed a very sharp increase in linkrot, and even worse, in the odious practice of spammers taking over my dead friends’ former blogs and turning them into AI spam-farms:

https://www.wired.com/story/confessions-of-an-ai-clickbait-kingpin/

The good people at the Pew Research Center have just released a careful, quantitative study of linkrot that confirms — and exceeds — my worst suspicions about the decay of the web:

https://www.pewresearch.org/data-labs/2024/05/17/when-online-content-disappears/

The headline finding from “When Online Content Disappears” is that 38% of the web of 2013 is gone today. Wikipedia references are especially hard-hit, with 23% of news links missing and 21% of government websites gone. The majority of Wikipedia entries have at least one broken link in their reference sections. Twitter is another industrial-scale oubliette: a fifth of English tweets disappear within a matter of months; for Turkish and Arabic tweets, it’s 40%.

Thankfully, someone has plugged the web’s memory-hole. Since 2001, the Internet Archive’s Wayback Machine has allowed web users to see captures of web-pages, tracking their changes over time. I was at the Wayback Machine’s launch party, and right away, I could see its value. Today, I make extensive use of Wayback Machine captures for my “This Day In History” posts, and when I find dead links on the web.

The Wayback Machine went public in 2001, but Archive founder Brewster Kahle started scraping the web in 1996. Today’s post graphic — a modified Yahoo homepage from October 17, 1996 — is the oldest Yahoo capture on the Wayback Machine:

https://web.archive.org/web/19960501000000*/yahoo.com

Remember that the next time someone tells you that we must stamp out web-scraping for one reason or another. There are plenty of ugly ways to use scraping (looking at you, Clearview AI) that we should ban, but scraping itself is very good:

https://pluralistic.net/2023/09/17/how-to-think-about-scraping/

And so is the Internet Archive, which makes the legal threats it faces today all the more frightening. Lawsuits brought by the Big Five publishers and Big Three labels will, if successful, snuff out the Internet Archive altogether, and with it, the Wayback Machine — the only record we have of our ephemeral internet:

https://blog.archive.org/2024/04/19/internet-archive-stands-firm-on-library-digital-rights-in-final-brief-of-hachette-v-internet-archive-lawsuit/

Libraries burn. The Internet Archive may seem like a sturdy and eternal repository for our collective object permanence about the internet, but it is very fragile, and could disappear like that.

If you’d like an essay-formatted version of this post to read or share, here’s a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:

https://pluralistic.net/2024/05/21/noway-back-machine/#pew-pew-pew

--

--

Cory Doctorow
Cory Doctorow

Written by Cory Doctorow

Writer, blogger, activist. Blog: https://pluralistic.net; Mailing list: https://pluralistic.net/plura-list; Mastodon: @pluralistic@mamot.fr

Responses (10)