Poisoning an LLM

jedishrfu · Friday, 10:35 PM

https://techxplore.com/news/2025-10-size-doesnt-small-malicious-corrupt.html

Large language models (LLMs), which power sophisticated AI chatbots, are more vulnerable than previously thought. According to research by Anthropic, the UK AI Security Institute and the Alan Turing Institute, it only takes 250 malicious documents to compromise even the largest models.

The vast majority of data used to train LLMs is scraped from the public internet. While this helps them to build knowledge and generate natural responses, it also puts them at risk from data poisoning attacks. It had been thought that as models grew, the risk was minimized because the percentage of poisoned data had to remain the same. In other words, it would need massive amounts of data to corrupt the largest models.

The researchers were able to poison an LLM with only 250 bad documents.

FactChecker · Friday, 10:56 PM

This really strengthens the case for restricting the data sources. It will cost money for subscriptions, both for the input data and for the users. Anything else would be vulnerable to sabotage by malicious countries, organizations, or even individuals.

Baluncore · Saturday, 12:00 AM

How is that different to real intelligence, where conspiracy theories abound. Entire cohorts of humans become corrupted when the learning sources are not sanitised prior to consumption.

Hornbein · Saturday, 1:03 AM

Here's the paper. https://arxiv.org/abs/2510.07192

jedishrfu · Saturday, 7:47 AM

Baluncore said:

How is that different to real intelligence, where conspiracy theories abound. Entire cohorts of humans become corrupted when the learning sources are not sanitised prior to consumption.

It's hard to quantify, but for open-minded critical thinkers, it's far less likely to happen. Their filtering systems are strong.

However, folks who aren't as academically inclined may look at the surface of things and can be fooled by what they've read, especially if it fits their belief system.

For AI systems, the filtering system is nonexistent because developers and trainers assume the majority of documents used in training will override any bad ones in the mix. This latest study now challenges this view.

---

A case in point, recently I saw a video that told a story of an abusive cop assaulting a father and son over the theft of a TV from a department store. A store employee monitoring the door accused them of stealing it since they couldn't produce a receipt.

The cashier verified that they had just bought it, but the door monitor insisted it was stolen. The cops were called while the father and son placed the TV in their car.

The cop put the father in a chokehold on the ground, and the father started gasping. The 12-year-old son, horrified by what he saw, found his father's gun and shot the officer.

The receipt was later found near the cash registers, verifying their claim. Some employees stepped forward to claim the store employee was always stirring up trouble.

The story felt so real, and it fit the narrative of cops bullying people of color, but as I looked into the story, I couldn't find anything about it. But because it felt so real, I mentioned it to others.

Later, I saw another poster write that this channel produced incendiary fake news for clicks using fake footage and narrative without providing verifiable details.

The story fit a narrative I believed to be true. The facts seemed to line up, making it believable, so I assumed it was true. Yet, I was still bothered when I couldn't find a reference to it in the news, which made me think it might not be real. That's when I found the fake news comment for that channel.

---

Reviewing this, I remember the Jussie Smollett case, where he claimed some thugs assaulted him and screamed racial and gay epithets at him as they beat him up. But later, the police found he had paid two brothers to stage the assault now known as the Hate Crime Hoax. It damaged his credibility as an actor. More digging by reporters and police revealed this wasn't the only time he's done this for lesser offenses.

https://en.wikipedia.org/wiki/Jussie_Smollett

Poisoning an LLM

Thread 'The Human Cost of AI'

Thread 'Poisoning an LLM'

Thread 'If you think having a backup is too expensive, try not having one'

Similar threads

Hot Threads

Is AI hype?

How to disable AI responses in Google Searches?

More on Distributing High Quality Audio

Looking For Ideas for a Hackathon: 'AI-Driven Diagnostic Efficiency & Solution'

On Progress Toward AGI

Recent Insights

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers