Future of Web-Scale Training Sets: Unpacking Data Poisoning Concerns

  • Thread starter Thread starter Frabjous
  • Start date Start date
Click For Summary
The discussion centers on the concept of poisoned datasets, highlighted in a recent article from The Economist. The conversation raises questions about the future of web-scale training sets and whether data poisoning is a temporary challenge, a long-term concern, or an overreaction. Historical context is provided, noting that data poisoning has roots in practices like keyword stuffing aimed at manipulating search engines. The consensus suggests that if AI companies rely on public web data, they will need to implement strategies to avoid problematic patterns, similar to existing search engine protocols. The overall viewpoint leans towards viewing data poisoning as both a significant issue and an overreaction, indicating a complex landscape for AI data management.
Frabjous
Gold Member
Messages
1,957
Reaction score
2,385
I read an article in the April 6 edition of The Economist (regretfully behind a paywall) about poisoned datasets. Here’s an arxiv article it referenced.
https://arxiv.org/abs/2302.10149

What is the future of web-scale training sets? Is data poisoning a start-up pang, a long-term issue or an overreaction.
 
Technology news on Phys.org
Data poisoning started way back when with keyword stuffing to trick search engines. Nothing new here. If the public web is the source, AI companies will have to program their AI to avoid certain patterns like search engines already do today.

So I guess my opinion is a mix of long-term issue and overreaction.
 
Learn If you want to write code for Python Machine learning, AI Statistics/data analysis Scientific research Web application servers Some microcontrollers JavaScript/Node JS/TypeScript Web sites Web application servers C# Games (Unity) Consumer applications (Windows) Business applications C++ Games (Unreal Engine) Operating systems, device drivers Microcontrollers/embedded systems Consumer applications (Linux) Some more tips: Do not learn C++ (or any other dialect of C) as a...

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 0 ·
Replies
0
Views
2K
Replies
10
Views
4K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
24
Views
8K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 94 ·
4
Replies
94
Views
11K
  • · Replies 7 ·
Replies
7
Views
3K