Backdooring a summarizerbot to shape opinion

Model spinning maintains accuracy metrics, but changes the point of view.

Cory Doctorow
6 min readOct 21, 2022

--

An old fashioned hand-cranked meat-grinder; a fan of documents are being fed into its hopper; its output mouth has been replaced with the staring red eye of HAL9000 from 2001: A Space Odyssey; emitting from that mouth is a stream of pink slurry. Image: Cryteria (modified) https://commons.wikimedia.org/wiki/File:HAL9000.svg CC BY 3.0 https://creativecommons.org/licenses/by/3.0/deed.en PublicBenefit https://commons.wikimedia.org/wiki/File:Texture.png Jollymon001 https://commons.wikimedia.org/w

What’s worse than a tool that doesn’t work? One that does work, nearly perfectly, except when it fails in unpredictable and subtle ways. Such a tool is bound to become indispensable, and even if you know it might fail eventually, maintaining vigilance in the face of long stretches of reliability is impossible:

https://techcrunch.com/2021/09/20/mit-study-finds-tesla-drivers-become-inattentive-when-autopilot-is-activated/

Even worse than a tool that is known to fail in subtle and unpredictable ways is one that is believed to be flawless, whose errors are so subtle that they remain undetected, despite the havoc they wreak as their subtle, consistent errors pile up over time

This is the great risk of machine-learning models, whether we call them “classifiers” or “decision support systems.” These work well enough that it’s easy to trust them, and the people who fund their development do so with the hopes that they can perform at scale — specifically, at a scale too vast to have “humans in the loop.”

There’s no market for a machine-learning autopilot, or content moderation algorithm, or loan officer, if all it does is cough up a recommendation for a…

--

--