Podcasting “How To Think About Scraping”

How to preserve the benefits of web-scraping while targeting the real harms.

Cory Doctorow
11 min readSep 25, 2023

--

A paint scraper on a window-sill. The blade of the scraper has been overlaid with a ‘code rain’ effect as seen in the credits of the Wachowskis’ ‘Matrix’ movies. Image: syvwlch (modified) https://commons.wikimedia.org/wiki/File:Print_Scraper_(5856642549).jpg CC BY-SA 2.0 https://creativecommons.org/licenses/by/2.0/deed.en

Wednesday (September 27), I’ll be at Chevalier’s Books in Los Angeles with Brian Merchant for a joint launch for my new book The Internet Con and his new book, Blood in the Machine. On October 2, I’ll be in Boise to host an event with VE Schwab.

This week on my podcast, I read my recent Medium column, “How To Think About Scraping: In privacy and labor fights, copyright is a clumsy tool at best,” which proposes ways to retain the benefits of scraping without the privacy and labor harms that sometimes accompany it:

https://doctorow.medium.com/how-to-think-about-scraping-2db6f69a7e3d?sk=4a1d687171de1a3f3751433bffbb5a96

What are those benefits from scraping? Well, take computational linguistics, a relatively new discipline that is producing the first accounts of how informal language works. Historically, linguists overstudied written language (because it was easy to analyze) and underanalyzed speech (because you had to record speakers and then get grad students to transcribe their dialog).

The thing is, very few of us produce formal, written work, whereas we all engage in casual dialog. But then the internet came along, and for the first time, we had a…

--

--