Poisoning an LLM

  • Thread starter Thread starter jedishrfu
  • Start date Start date
Messages
15,493
Reaction score
10,217
TL;DR Summary
Training data from 250 documents was able to poison the output of even the largest LLM
https://techxplore.com/news/2025-10-size-doesnt-small-malicious-corrupt.html

Large language models (LLMs), which power sophisticated AI chatbots, are more vulnerable than previously thought. According to research by Anthropic, the UK AI Security Institute and the Alan Turing Institute, it only takes 250 malicious documents to compromise even the largest models.

The vast majority of data used to train LLMs is scraped from the public internet. While this helps them to build knowledge and generate natural responses, it also puts them at risk from data poisoning attacks. It had been thought that as models grew, the risk was minimized because the percentage of poisoned data had to remain the same. In other words, it would need massive amounts of data to corrupt the largest models.

The researchers were able to poison an LLM with only 250 bad documents.
 
  • Informative
  • Wow
  • Like
Likes CalcNerd, hutchphd, jack action and 2 others
Computer science news on Phys.org
This really strengthens the case for restricting the data sources. It will cost money for subscriptions, both for the input data and for the users. Anything else would be vulnerable to sabotage by malicious countries, organizations, or even individuals.
 
How is that different to real intelligence, where conspiracy theories abound. Entire cohorts of humans become corrupted when the learning sources are not sanitised prior to consumption.
 
  • Agree
  • Like
Likes FactChecker, phinds, jedishrfu and 2 others
Baluncore said:
How is that different to real intelligence, where conspiracy theories abound. Entire cohorts of humans become corrupted when the learning sources are not sanitised prior to consumption.
It's hard to quantify, but for open-minded critical thinkers, it's far less likely to happen. Their filtering systems are strong.

However, folks who aren't as academically inclined may look at the surface of things and can be fooled by what they've read, especially if it fits their belief system.

For AI systems, the filtering system is nonexistent because developers and trainers assume the majority of documents used in training will override any bad ones in the mix. This latest study now challenges this view.

---

A case in point, recently I saw a video that told a story of an abusive cop assaulting a father and son over the theft of a TV from a department store. A store employee monitoring the door accused them of stealing it since they couldn't produce a receipt.

The cashier verified that they had just bought it, but the door monitor insisted it was stolen. The cops were called while the father and son placed the TV in their car.

The cop put the father in a chokehold on the ground, and the father started gasping. The 12-year-old son, horrified by what he saw, found his father's gun and shot the officer.

The receipt was later found near the cash registers, verifying their claim. Some employees stepped forward to claim the store employee was always stirring up trouble.

The story felt so real, and it fit the narrative of cops bullying people of color, but as I looked into the story, I couldn't find anything about it. But because it felt so real, I mentioned it to others.

Later, I saw another poster write that this channel produced incendiary fake news for clicks using fake footage and narrative without providing verifiable details.

The story fit a narrative I believed to be true. The facts seemed to line up, making it believable, so I assumed it was true. Yet, I was still bothered when I couldn't find a reference to it in the news, which made me think it might not be real. That's when I found the fake news comment for that channel.

---

Reviewing this, I remember the Jussie Smollett case, where he claimed some thugs assaulted him and screamed racial and gay epithets at him as they beat him up. But later, the police found he had paid two brothers to stage the assault now known as the Hate Crime Hoax. It damaged his credibility as an actor. More digging by reporters and police revealed this wasn't the only time he's done this for lesser offenses.

https://en.wikipedia.org/wiki/Jussie_Smollett
 
  • Sad
Likes FactChecker
This week, I saw a documentary done by the French called Les sacrifiés de l'IA, which was presented by a Canadian show Enquête. If you understand French I recommend it. Very eye-opening. I found a similar documentary in English called The Human Cost of AI: Data workers in the Global South. There is also an interview with Milagros Miceli (appearing in both documentaries) on Youtube: I also found a powerpoint presentation by the economist Uma Rani (appearing in the French documentary), AI...
Back
Top