How Vulnerable Are Large Language Models to Malicious Data Poisoning?

  • Thread starter Thread starter jedishrfu
  • Start date Start date
Click For Summary

Discussion Overview

The discussion centers on the vulnerability of large language models (LLMs) to data poisoning attacks, particularly in light of recent research indicating that even a small number of malicious documents can compromise these models. Participants explore the implications of this vulnerability for data sourcing and the parallels to human intelligence and belief systems.

Discussion Character

  • Exploratory
  • Debate/contested
  • Conceptual clarification

Main Points Raised

  • Some participants highlight that LLMs can be compromised with as few as 250 malicious documents, challenging previous assumptions about the resilience of larger models.
  • One participant suggests that restricting data sources could mitigate risks, although this may incur additional costs for subscriptions.
  • Another participant draws a comparison between the susceptibility of LLMs to misinformation and the way humans can be influenced by unverified information, particularly in the context of conspiracy theories.
  • A participant recounts a personal anecdote about encountering a potentially false narrative, illustrating how easily misinformation can be accepted as truth.
  • There is a discussion about the mechanisms by which AI can be "gaslighted" or led to produce erroneous outputs based on contaminated input data, emphasizing the probabilistic nature of AI responses.

Areas of Agreement / Disagreement

Participants express a range of views on the implications of data poisoning for LLMs, with no clear consensus on the best approaches to mitigate these vulnerabilities. Some agree on the risks posed by malicious data, while others debate the effectiveness of current filtering systems.

Contextual Notes

Participants note the limitations of current data sanitization processes and the challenges in quantifying the impact of misinformation on both AI and human cognition. The discussion reflects ongoing uncertainties regarding the robustness of AI training methodologies.

Who May Find This Useful

This discussion may be of interest to researchers in AI safety, developers of language models, and individuals concerned with the implications of misinformation in both AI and human contexts.

Messages
15,636
Reaction score
10,424
TL;DR
Training data from 250 documents was able to poison the output of even the largest LLM
https://techxplore.com/news/2025-10-size-doesnt-small-malicious-corrupt.html

Large language models (LLMs), which power sophisticated AI chatbots, are more vulnerable than previously thought. According to research by Anthropic, the UK AI Security Institute and the Alan Turing Institute, it only takes 250 malicious documents to compromise even the largest models.

The vast majority of data used to train LLMs is scraped from the public internet. While this helps them to build knowledge and generate natural responses, it also puts them at risk from data poisoning attacks. It had been thought that as models grew, the risk was minimized because the percentage of poisoned data had to remain the same. In other words, it would need massive amounts of data to corrupt the largest models.

The researchers were able to poison an LLM with only 250 bad documents.
 
  • Informative
  • Wow
  • Like
Likes   Reactions: CalcNerd, hutchphd, jack action and 2 others
Computer science news on Phys.org
This really strengthens the case for restricting the data sources. It will cost money for subscriptions, both for the input data and for the users. Anything else would be vulnerable to sabotage by malicious countries, organizations, or even individuals.
 
  • Like
Likes   Reactions: jedishrfu
How is that different to real intelligence, where conspiracy theories abound. Entire cohorts of humans become corrupted when the learning sources are not sanitised prior to consumption.
 
  • Agree
  • Like
Likes   Reactions: harborsparrow, FactChecker, phinds and 3 others
  • Like
  • Informative
Likes   Reactions: harborsparrow, FactChecker, jedishrfu and 1 other person
Baluncore said:
How is that different to real intelligence, where conspiracy theories abound. Entire cohorts of humans become corrupted when the learning sources are not sanitised prior to consumption.
It's hard to quantify, but for open-minded critical thinkers, it's far less likely to happen. Their filtering systems are strong.

However, folks who aren't as academically inclined may look at the surface of things and can be fooled by what they've read, especially if it fits their belief system.

For AI systems, the filtering system is nonexistent because developers and trainers assume the majority of documents used in training will override any bad ones in the mix. This latest study now challenges this view.

---

A case in point, recently I saw a video that told a story of an abusive cop assaulting a father and son over the theft of a TV from a department store. A store employee monitoring the door accused them of stealing it since they couldn't produce a receipt.

The cashier verified that they had just bought it, but the door monitor insisted it was stolen. The cops were called while the father and son placed the TV in their car.

The cop put the father in a chokehold on the ground, and the father started gasping. The 12-year-old son, horrified by what he saw, found his father's gun and shot the officer.

The receipt was later found near the cash registers, verifying their claim. Some employees stepped forward to claim the store employee was always stirring up trouble.

The story felt so real, and it fit the narrative of cops bullying people of color, but as I looked into the story, I couldn't find anything about it. But because it felt so real, I mentioned it to others.

Later, I saw another poster write that this channel produced incendiary fake news for clicks using fake footage and narrative without providing verifiable details.

The story fit a narrative I believed to be true. The facts seemed to line up, making it believable, so I assumed it was true. Yet, I was still bothered when I couldn't find a reference to it in the news, which made me think it might not be real. That's when I found the fake news comment for that channel.

---

Reviewing this, I remember the Jussie Smollett case, where he claimed some thugs assaulted him and screamed racial and gay epithets at him as they beat him up. But later, the police found he had paid two brothers to stage the assault now known as the Hate Crime Hoax. It damaged his credibility as an actor. More digging by reporters and police revealed this wasn't the only time he's done this for lesser offenses.

https://en.wikipedia.org/wiki/Jussie_Smollett
 
  • Like
  • Sad
Likes   Reactions: harborsparrow and FactChecker
Yes, an AI is highly vulnerable if it comes to this kind of alteration of its probability-space. Simple explanation is; Even in one chat you can gas-lighting an AI into hallucinations. A chat, in which an AI uses its history as a context-memory is the place, where it derives his further actions. If a chat is contaminated with some mistakes, then the probability of further mistakes rise.

What there happens is, in a very specific region inside AIs probabilistic vectorspace (where the chat happens) the probabilities are altered. During the training phase it is im principle the same. In this incredible large multidimensional space you put a small tiny place with a road to it, a trigger you can say, where you can reach that point of actions by purpose.

It is not so, that it ruins all the knowledge and probabilities at whole, but it installs small places of destructive little vector cluster. Imagine it like a small village with bad people in a lawful good country, where the majority of villages and cities are in tact still.

Greetings
Esim Can
 
  • Like
Likes   Reactions: harborsparrow

Similar threads

  • · Replies 3 ·
Replies
3
Views
3K