Future of Web-Scale Training Sets: Unpacking Data Poisoning Concerns

Frabjous · Apr 16, 2023

I read an article in the April 6 edition of The Economist (regretfully behind a paywall) about poisoned datasets. Here’s an arxiv article it referenced.
https://arxiv.org/abs/2302.10149

What is the future of web-scale training sets? Is data poisoning a start-up pang, a long-term issue or an overreaction.

jack action · Apr 16, 2023

Data poisoning started way back when with keyword stuffing to trick search engines. Nothing new here. If the public web is the source, AI companies will have to program their AI to avoid certain patterns like search engines already do today.

So I guess my opinion is a mix of long-term issue and overreaction.

Future of Web-Scale Training Sets: Unpacking Data Poisoning Concerns

Thread 'How to connect Frontend and Backend?'

Thread 'Is this public key encryption?'

Thread 'Who is responsible for the software when AI takes over programming?'

Similar threads

Hot Threads

Touch-typing for programmers

How to calculate Tension for a series of connected points?

Trying To Debug A Python File

Python Complaining About Python

Fortran Reading files in pre-f77 - handling end of file

Recent Insights

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem