Copyright takedowns are a cautionary tale that few are heeding

If you can’t post a law school symposium on fair use…

Cory Doctorow
15 min readJun 27, 2024
EFF’s banner for the ‘Unfiltered’ white paper, depicting TV static overlaid with a parody of the Youtube logo and wordmark, but instead of ‘Youtube’ it reads ‘Fair Use,’ with glitched vertical and horizontal sync that distorts the logo. Image: EFF CC BY 3.0

On July 14, I’m giving the closing keynote for the fifteenth Hackers On Planet Earth, in Queens, NY. On July 20, I’m appearing at Chicago’s Exile in Bookville.

We’re living through one of those moments when millions of people become suddenly and overwhelmingly interested in fair use, one of the subtlest and worst-understood aspects of copyright law. It’s not a subject you can master by skimming a Wikipedia article!

I’ve been talking about fair use with laypeople for more than 20 years. I’ve met so many people who possess the unshakable, serene confidence of the truly wrong, like the people who think fair use means you can take x words from a book, or y seconds from a song and it will always be fair, while anything more will never be.

Or the people who think that if you violate any of the four factors, your use can’t be fair — or the people who think that if you fail all of the four factors, you must be infringing (people, the Supreme Court is calling and they want to tell you about the Betamax!).

You might think that you can never quote a song lyric in a book without infringing copyright, or that you must clear every musical sample. You might be rock solid certain that scraping the web to train an AI is infringing. If you hold those beliefs, you do not understand the “fact intensive” nature of fair use.

But you can learn! It’s actually a really cool and interesting and gnarly subject, and it’s a favorite of copyright scholars, who have really fascinating disagreements and discussions about the subject. These discussions often key off of the controversies of the moment, but inevitably they implicate earlier fights about everything from the piano roll to 2 Live Crew to antiracist retellings of Gone With the Wind.

One of the most interesting discussions of fair use you can ask for took place in 2019, when the NYU Engelberg Center on Innovation Law & Policy held a symposium called “Proving IP.” One of the panels featured dueling musicologists debating the merits of the Blurred Lines case. That case marked a turning point in music copyright, with the Marvin Gaye estate successfully suing Robin Thicke and Pharrell Williams for copying the “vibe” of Gaye’s “Got to Give it Up.”

Naturally, this discussion featured clips from both songs as the experts — joined by some of America’s top copyright scholars — delved into the legal reasoning and future consequences of the case. It would be literally impossible to discuss this case without those clips.

And that’s where the problems start: as soon as the symposium was uploaded to Youtube, it was flagged and removed by Content ID, Google’s $100,000,000 copyright enforcement system. This initial takedown was fully automated, which is how Content ID works: rightsholders upload audio to claim it, and then Content ID removes other videos where that audio appears (rightsholders can also specify that videos with matching clips be demonetized, or that the ad revenue from those videos be diverted to the rightsholders).

But Content ID has a safety valve: an uploader whose video has been incorrectly flagged can challenge the takedown. The case is then punted to the rightsholder, who has to manually renew or drop their claim. In the case of this symposium, the rightsholder was Universal Music Group, the largest record company in the world. UMG’s personnel reviewed the video and did not drop the claim.

99.99% of the time, that’s where the story would end, for many reasons. First of all, most people don’t understand fair use well enough to contest the judgment of a cosmically vast, unimaginably rich monopolist who wants to censor their video. Just as importantly, though, is that Content ID is a Byzantine system that is nearly as complex as fair use, but it’s an entirely private affair, created and adjudicated by another galactic-scale monopolist (Google).

Google’s copyright enforcement system is a cod-legal regime with all the downsides of the law, and a few wrinkles of its own (for example, it’s a system without lawyers — just corporate experts doing battle with laypeople). And a single mis-step can result in your video being deleted or your account being permanently deleted, along with every video you’ve ever posted. For people who make their living on audiovisual content, losing your Youtube account is an extinction-level event:

So for the average Youtuber, Content ID is a kind of Kafka-as-a-Service system that is always avoided and never investigated. But the Engelbert Center isn’t your average Youtuber: they boast some of the country’s top copyright experts, specializing in exactly the questions Youtube’s Content ID is supposed to be adjudicating.

So naturally, they challenged the takedown — only to have UMG double down. This is par for the course with UMG: they are infamous for refusing to consider fair use in takedown requests. Their stance is so unreasonable that a court actually found them guilty of violating the DMCA’s provision against fraudulent takedowns:

But the DMCA’s takedown system is part of the real law, while Content ID is a fake law, created and overseen by a tech monopolist, not a court. So the fate of the Blurred Lines discussion turned on the Engelberg Center’s ability to navigate both the law and the n-dimensional topology of Content ID’s takedown flowchart.

It took more than a year, but eventually, Engelberg prevailed.

Until they didn’t.

If Content ID was a person, it would be baby, specifically, a baby under 18 months old — that is, before the development of “object permanence.” Until our 18th month (or so), we lack the ability to reason about things we can’t see — this the period when small babies find peek-a-boo amazing. Object permanence is the ability to understand things that aren’t in your immediate field of vision.

Content ID has no object permanence. Despite the fact that the Engelberg Blurred Lines panel was the most involved fair use question the system was ever called upon to parse, it managed to repeatedly forget that it had decided that the panel could stay up. Over and over since that initial determination, Content ID has taken down the video of the panel, forcing Engelberg to go through the whole process again.

But that’s just for starters, because Youtube isn’t the only place where a copyright enforcement bot is making billions of unsupervised, unaccountable decisions about what audiovisual material you’re allowed to access.

Spotify is yet another monopolist, with a justifiable reputation for being extremely hostile to artists’ interests, thanks in large part to the role that UMG and the other major record labels played in designing its business rules:

Spotify has spent hundreds of millions of dollars trying to capture the podcasting market, in the hopes of converting one of the last truly open digital publishing systems into a product under its control:

Thankfully, that campaign has failed — but millions of people have (unwisely) ditched their open podcatchers in favor of Spotify’s pre-enshittified app, so everyone with a podcast now must target Spotify for distribution if they hope to reach those captive users.

Guess who has a podcast? The Engelberg Center.

Naturally, Engelberg’s podcast includes the audio of that Blurred Lines panel, and that audio includes samples from both “Blurred Lines” and “Got To Give It Up.”

So — naturally — UMG keeps taking down the podcast.

Spotify has its own answer to Content ID, and incredibly, it’s even worse and harder to navigate than Google’s pretend legal system. As Engelberg describes in its latest post, UMG and Spotify have colluded to ensure that this now-classic discussion of fair use will never be able to take advantage of fair use itself:

Remember, this is the best case scenario for arguing about fair use with a monopolist like UMG, Google, or Spotify. As Engelberg puts it:

The Engelberg Center had an extraordinarily high level of interest in pursuing this issue, and legal confidence in our position that would have cost an average podcaster tens of thousands of dollars to develop. That cannot be what is required to challenge the removal of a podcast episode.

Automated takedown systems are the tech industry’s answer to the “notice-and-takedown” system that was invented to broker a peace between copyright law and the internet, starting with the US’s 1998 Digital Millennium Copyright Act. The DMCA implements (and exceeds) a pair of 1996 UN treaties, the WIPO Copyright Treaty and the Performances and Phonograms Treaty, and most countries in the world have some version of notice-and-takedown.

Big corporate rightsholders claim that notice-and-takedown is a gift to the tech sector, one that allows tech companies to get away with copyright infringement. They want a “strict liability” regime, where any platform that allows a user to post something infringing is liable for that infringement, to the tune of $150,000 in statutory damages.

Of course, there’s no way for a platform to know a priori whether something a user posts infringes on someone’s copyright. There is no registry of everything that is copyrighted, and of course, fair use means that there are lots of ways to legally reproduce someone’s work without their permission (or even when they object). Even if every person who ever has trained or ever will train as a copyright lawyer worked 24/7 for just one online platform to evaluate every tweet, video, audio clip and image for copyright infringement, they wouldn’t be able to touch even 1% of what gets posted to that platform.

The “compromise” that the entertainment industry wants is automated takedown — a system like Content ID, where rightsholders register their copyrights and platforms block anything that matches the registry. This “filternet” proposal became law in the EU in 2019 with Article 17 of the Digital Single Market Directive:

This was the most controversial directive in EU history, and — as experts warned at the time — there is no way to implement it without violating the GDPR, Europe’s privacy law, so now it’s stuck in limbo:

As critics pointed out during the EU debate, there are so many problems with filternets. For one thing, these copyright filters are very expensive: remember that Google has spent $100m on Content ID alone, and that only does a fraction of what filternet advocates demand. Building the filternet would cost so much that only the biggest tech monopolists could afford it, which is to say, filternets are a legal requirement to keep the tech monopolists in business and prevent smaller, better platforms from ever coming into existence.

Filternets are also incapable of telling the difference between similar files. This is especially problematic for classical musicians, who routinely find their work blocked or demonetized by Sony Music, which claims performances of all the most important classical music compositions:

Content ID can’t tell the difference between your performance of “The Goldberg Variations” and Glenn Gould’s. For classical musicians, the best case scenario is to have their online wages stolen by Sony, who fraudulently claim copyright to their recordings. The worst case scenario is that their video is blocked, their channel deleted, and their names blacklisted from ever opening another account on one of the monopoly platforms.

But when it comes to free expression, the role that notice-and-takedown and filternets play in the creative industries is really a sideshow. In creating a system of no-evidence-required takedowns, with no real consequences for fraudulent takedowns, these systems are huge gift to the world’s worst criminals. For example, “reputation management” companies help convicted rapists, murderers, and even war criminals purge the internet of true accounts of their crimes by claiming copyright over them:

Remember how during the covid lockdowns, scumbags marketed junk devices by claiming that they’d protect you from the virus? Their products remained online, while the detailed scientific articles warning people about the fraud were speedily removed through false copyright claims:

Copyfraud — making false copyright claims — is an extremely safe crime to commit, and it’s not just quack covid remedy peddlers and war criminals who avail themselves of it. Tech giants like Adobe do not hesitate to abuse the takedown system, even when that means exposing millions of people to spyware:

Dirty cops play loud, copyrighted music during confrontations with the public, in the hopes that this will trigger copyright filters on services like Youtube and Instagram and block videos of their misbehavior:

But even if you solved all these problems with filternets and takedown, this system would still choke on fair use and other copyright exceptions. These are “fact intensive” questions that the world’s top experts struggle with (as anyone who watches the Blurred Lines panel can see). There’s no way we can get software to accurately determine when a use is or isn’t fair.

That’s a question that the entertainment industry itself is increasingly conflicted about. The Blurred Lines judgment opened the floodgates to a new kind of copyright troll — grifters who sued the record labels and their biggest stars for taking the “vibe” of songs that no one ever heard of. Musicians like Ed Sheeran have been sued for millions of dollars over these alleged infringements. These suits caused the record industry to (ahem) change its tune on fair use, insisting that fair use should be broadly interpreted to protect people who made things that were similar to existing works. The labels understood that if “vibe rights” became accepted law, they’d end up in the kind of hell that the rest of us enter when we try to post things online — where anything they produce can trigger takedowns, long legal battles, and millions in liability:

But the music industry remains deeply conflicted over fair use. Take the curious case of Katy Perry’s song “Dark Horse,” which attracted a multimillion-dollar suit from an obscure Christian rapper who claimed that a brief phrase in “Dark Horse” was impermissibly similar to his song “A Joyful Noise.”

Perry and her publisher, Warner Chappell, lost the suit and were ordered to pay $2.8m. While they subsequently won an appeal, this definitely put the cold grue up Warner Chappell’s back. They could see a long future of similar suits launched by treasure hunters hoping for a quick settlement.

But here’s where it gets unbelievably weird and darkly funny. A Youtuber named Adam Neely made a wildly successful viral video about the suit, taking Perry’s side and defending her song. As part of that video, Neely included a few seconds’ worth of “A Joyful Noise,” the song that Perry was accused of copying.

In court, Warner Chappell had argued that “A Joyful Noise” was not similar to Perry’s “Dark Horse.” But when Warner had Google remove Neely’s video, they claimed that the sample from “Joyful Noise” was actually taken from “Dark Horse.” Incredibly, they maintained this position through multiple appeals through the Content ID system:

In other words, they maintained that the song that they’d told the court was totally dissimilar to their own was so indistinguishable from their own song that they couldn’t tell the difference!

Now, this question of vibes, similarity and fair use has only gotten more intense since the takedown of Neely’s video. Just this week, the RIAA sued several AI companies, claiming that the songs the AI shits out are infringingly similar to tracks in their catalog:

Even before “Blurred Lines,” this was a difficult fair use question to answer, with lots of chewy nuances. Just ask George Harrison:

But as the Engelberg panel’s cohort of dueling musicologists and renowned copyright experts proved, this question only gets harder as time goes by. If you listen to that panel (if you can listen to that panel), you’ll be hard pressed to come away with any certainty about the questions in this latest lawsuit.

The notice-and-takedown system is what’s known as an “intermediary liability” rule. Platforms are “intermediaries” in that they connect end users with each other and with businesses. Ebay and Etsy and Amazon connect buyers and sellers; Facebook and Google and Tiktok connect performers, advertisers and publishers with audiences and so on.

For copyright, notice-and-takedown gives platforms a “safe harbor.” A platform doesn’t have to remove material after an allegation of infringement, but if they don’t, they’re jointly liable for any future judgment. In other words, Youtube isn’t required to take down the Engelberg Blurred Lines panel, but if UMG sues Engelberg and wins a judgment, Google will also have to pay out.

During the adoption of the 1996 WIPO treaties and the 1998 US DMCA, this safe harbor rule was characterized as a balance between the rights of the public to publish online and the interest of rightsholders whose material might be infringed upon. The idea was that things that were likely to be infringing would be immediately removed once the platform received a notification, but that platforms would ignore spurious or obviously fraudulent takedowns.

That’s not how it worked out. Whether it’s Sony Music claiming to own your performance of “Fur Elise” or a war criminal claiming authorship over a newspaper story about his crimes, platforms nuke first and ask questions never. Why not? If they ignore a takedown and get it wrong, they suffer dire consequences ($150,000 per claim). But if they take action on a dodgy claim, there are no consequences. Of course they’re just going to delete anything they’re asked to delete.

This is how platforms always handle liability, and that’s a lesson that we really should have internalized by now. After all, the DMCA is the second-most famous intermediary liability system for the internet — the most (in)famous is Section 230 of the Communications Decency Act.

This is a 27-word law that says that platforms are not liable for civil damages arising from their users’ speech. Now, this is a US law, and in the US, there aren’t many civil damages from speech to begin with. The First Amendment makes it very hard to get a libel judgment, and even when these judgments are secured, damages are typically limited to “actual damages” — generally a low sum. Most of the worst online speech is actually not illegal: hate speech, misinformation and disinformation are all covered by the First Amendment.

Notwithstanding the First Amendment, there are categories of speech that US law criminalizes: actual threats of violence, criminal harassment, and committing certain kinds of legal, medical, election or financial fraud. These are all exempted from Section 230, which only provides immunity for civil suits, not criminal acts.

What Section 230 really protects platforms from is being named to unwinnable nuisance suits by unscrupulous parties who are betting that the platforms would rather remove legal speech that they object to than go to court. A generation of copyfraudsters have proved that this is a very safe bet:

In other words, if you made a #MeToo accusation, or if you were a gig worker using an online forum to organize a union, or if you were blowing the whistle on your employer’s toxic waste leaks, or if you were any other under-resourced person being bullied by a wealthy, powerful person or organization, that organization could shut you up by threatening to sue the platform that hosted your speech. The platform would immediately cave. But those same rich and powerful people would have access to the lawyers and back-channels that would prevent you from doing the same to them — that’s why Sony can get your Brahms recital taken down, but you can’t turn around and do the same to them.

This is true of every intermediary liability system, and it’s been true since the earliest days of the internet, and it keeps getting proven to be true. Six years ago, Trump signed SESTA/FOSTA, a law that allowed platforms to be held civilly liable by survivors of sex trafficking. At the time, advocates claimed that this would only affect “sexual slavery” and would not impact consensual sex-work.

But from the start, and ever since, SESTA/FOSTA has primarily targeted consensual sex-work, to the immediate, lasting, and profound detriment of sex workers:

SESTA/FOSTA killed the “bad date” forums where sex workers circulated the details of violent and unstable clients, killed the online booking sites that allowed sex workers to screen their clients, and killed the payment processors that let sex workers avoid holding unsafe amounts of cash:

SESTA/FOSTA made voluntary sex work more dangerous — and also made life harder for law enforcement efforts to target sex trafficking:

Despite half a decade of SESTA/FOSTA, despite 15 years of filternets, despite a quarter century of notice-and-takedown, people continue to insist that getting rid of safe harbors will punish Big Tech and make life better for everyday internet users.

As of now, it seems likely that Section 230 will be dead by then end of 2025, even if there is nothing in place to replace it:

This isn’t the win that some people think it is. By making platforms responsible for screening the content their users post, we create a system that only the largest tech monopolies can survive, and only then by removing or blocking anything that threatens or displeases the wealthy and powerful.

Filternets are not precision-guided takedown machines; they’re indiscriminate cluster-bombs that destroy anything in the vicinity of illegal speech — including (and especially) the best-informed, most informative discussions of how these systems go wrong, and how that blocks the complaints of the powerless, the marginalized, and the abused.

If you’d like an essay-formatted version of this post to read or share, here’s a link to it on, my surveillance-free, ad-free, tracker-free blog: