The Human Cost of AI

  • Thread starter Thread starter jack action
  • Start date Start date
  • Tags Tags
    Ai chatgpt Ethics
jack action
Science Advisor
Insights Author
Messages
3,508
Reaction score
9,740
TL;DR Summary
AI is powered by hundreds of millions of exploited data workers in the Global South.
This week, I saw a documentary done by the French called Les sacrifiés de l'IA, which was presented by a Canadian show Enquête. If you understand French I recommend it. Very eye-opening.

I found a similar documentary in English called The Human Cost of AI: Data workers in the Global South.

There is also an interview with Milagros Miceli (appearing in both documentaries) on Youtube:



I also found a powerpoint presentation by the economist Uma Rani (appearing in the French documentary), AI supply chains: The hidden human labour powering AI, which kind of resume her research about the subject. From that presentation:
Myth: Data needs are not finite, but infinite to sustain AI systems
Her point of view is that AI systems will never become sufficient and will always rely on data workers to feed them more data because things are always evolving, thus new data aways need to be examined. I thought it was a very interesting point.
 
Computer science news on Phys.org
There was a documentary of Philipine workers who did content moderation for Facebook and other major sites.

They suffered terribly with PTSD from all the horrible videos, posts and photos they had to review. They had no means of getting psychological help on how to deal with it.
 
  • Like
Likes jack action
But there is more than looking at horrible images.

In The Human Cost of AI: Data workers in the Global South, we can see how people in Africa looks at images of San Francisco traffic to train AI to recognize the San Francisco environment. The point of Uma Rani is that the San Francisco environment will change over time, thus the AI training will never stop, and either that cheap labor will always be needed or AI will cost a lot more in the future.

The "free" work done by AI for us (recognizing our environment) is actually just done overseas by cheap labor. AI does not do the job for us, it just permits people overseas to do it for us. That is food for thoughts.

AI for very specialized field seems to be a good idea - like discovering new molecules, for example - but to replace our daily repetitive tasks, it might not be as effective as we are told.
 
Yeah there also was a business of people cracking captchas for pennies allowing scammers access to captcha protected sites.
 
jack action said:
AI is powered by hundreds of millions of exploited data workers in the Global South.
That's some really wild over-exaggeration on the altar of sensationalism.
Yes, the IT- and data industry both has strong reliability on cheap and exploitable workforce.
But you won't find those numbers in the whole (!) industry.

Especially these days with the impact AI has on IT jobs.

I'm more concerned about the shrinking number of jobs and the general long term social and mental effects of the everyday reliance.
 
Last edited:
There are a few good examples of people stressing out:

- A person loses their phone or breaks it, causing the loss of contacts, photos, notes, and more.

- Someone bricks another person's phone, and they have no way to recover their data.

(happened to a friend when another person got the friend's phone and thought it was their
phone // after retyping the password too many times, bricked the phone -- and then ooh this isn't
my phone -- Duh)

- GPS maps stop working, causing people to get lost because they no longer carry paper maps

(happened to me at Grand Canyon when cell phone service had zero bars and I needed
to reload a map - fortunately, I had a similar one on my iPad)

- Lost phone, so now you have to use written directions to get where you're going.
Now there are no queuing notifications that the turn is coming up. Exit now.

(reminiscent of the tech show host, James Kim, while driving through Portland, and then decided
to switch to another highway on their way to Gold Beach, Oregon, took an old logging road that
was closed during the winter, but a hunter had left the road gate open.)

- doing a simple mental calculation done previously on your phone's calculator app

In each of these scenarios, people stress out due to their reliance on modern communication technology and services.
 
Rive said:
That's some really wild over-exaggeration on the altar of sensationalism.
Yes, the IT- and data industry both has strong reliability on cheap and exploitable workforce.
But you won't find those numbers in the whole (!) industry.
According to a report from the World Bank, the estimated number of data workers worldwide is between 150 and 430 million. It is discussed in the Miceli interview between 5:56 and 10:15.
 
jack action said:
worldwide
That's something very different.
Could you please provide a proper reference to that report and its content?
 
I think this is the one referred to: WORKING WITHOUT BORDERS - The Promise and Peril of Online Gig Work

p. 1:
Although online gig work is rapidly growing, there are no reliable data sources to estimate its size.
Using an innovative combination of mixed methods that include data science and proprietary firm
databases, along with a global web survey in 17 countries in six regions using the experimental
random domain intercept technology (RDIT), we estimate that the number of global online gig
workers ranges from 154 million to 435 million. The data science–based approach, relying on web
scraping and website traffic, finds that the number of unique registered online gig workers is 154
million globally, but this may be an underestimate. Meanwhile, the survey‑based approach suggests
that there are 132.5 million main gig workers, but when we include those who engage in gig work
as secondary or marginal workers, the estimate may be as high as 435 million online gig workers
globally, providing an upper bound estimate. In other words, the estimates show that the share
of online gig workers in the global labor force ranges from 4.4 to 12.5 percent. Our estimates are
higher than others, partly because our methodology made a concerted effort to track gig workers
on regional/local platforms that most literature has overlooked, but also because there has been
rapid growth in recent years, especially triggered by the COVID‑19 pandemic. Although our study
contributes to the literature by using multiple and nontraditional sources of data, more research is
needed to explore different methodologies to understand and monitor the development of the gig
economy in the absence of reliable labor market survey data.

Online gig jobs - one‑off job for which a worker is paid for a particular task or for a defined period - also include freelance work, p.8:
Online gig jobs, which include tasks or work assignments such as image tagging, data entry, website design or software development that are performed and delivered online by workers. Online gig work is of two types.
  1. Online freelancing, also called e‑lancing, tends to involve larger projects that are performed over longer times and typically includes complex tasks targeting more intermediate‑ or high‑skilled workers—for example, software development, graphic design, and e‐marketing (Raftree et al. 2017).
  2. Microwork, on the other hand, involves projects and tasks that are broken down into small subtasks that can be completed in seconds or minutes by remote workers through online platforms (Kuek et al. 2015). Microworkers are typically paid small amounts of money for each completed task, which can often be performed with basic numeracy and literacy skills. These tasks include image tagging, text transcription, and data entry (Raftree et al. 2017). Microwork has lower barriers to entry than online freelancing, making it an attractive income‑generating opportunity for unemployed and underemployed individuals with few or no specialized skills.

p. 119-120:
Looking at trends over time, the demand for clerical and data entry tasks increased much
more than for other types of tasks.
The market share of clerical and data entry jobs in digital labor
platforms has increased by more than eight percentage points between 2017 and 2022. The shares
of sales and marketing support as well as professional tasks increased also, although very slightly. By
contrast, the shares of creative and multimedia and software development tasks among all tasks out‑
sourced to gig workers dropped between 2017 and 2022 (see Figure 5.7). This increase likely reflects
the rising demand for microwork: small tasks performed on crowd work platforms (Morris et al. 2017).

The growing adoption of artificial intelligence (AI) in different industries is increasing the
demand for microworkers.
AI producers create machine learning algorithms to develop applications
ranging from chatbots and hands‑free vocal assistants to automated medical image technologies,
self‑driving vehicles, and drones. Developing these algorithms requires the preparation of quality
big data. This generates demand for microtasks such as tagging photographs, sorting items in a
list, adding labels, providing sample audios, and so on. Moreover, microworkers are also needed to
verify the predictions of AI. These tasks could be confirming the correctness of image classifications
or checking that a virtual assistant understood what its users said, for example, to improve the AI
functionality (Tubaro and Casilli 2019). Project Karya, a smartphone‑based crowdsourcing platform,
offers AI data labeling and enrichment tasks to people in rural communities in an attempt to tap
into the growing market for AI tasks while simultaneously providing work opportunities for people
previously excluded from the digital economy due to a lack of connectivity where they live.

Developments in big tech are playing an important role, too, especially in creating new
types of microtasks.
As Google and Apple expand their user interface to incorporate Voice over
Internet Protocol (VoIP) applications such as Siri and OK Google, the demand for microwork‑related
speech transcription, translation, and text transcription is moving to the forefront. As companies
work to create more‑accurate VoIP systems, nuances such as country‑specific accents are playing
an important role in creating a trend toward “inclusive tech.” This has created demand for simple
microtasks such as reading, translating, or transcribing a sentence in a particular language, which is
an important avenue of demand for regional platforms. Microsoft Research India, for example, built
an Android application to measure the accuracy with which participants can digitize handwritten
Marathi and Hindi words in rural India, based on the real‑world need for digitization of handwritten
Devanagari script documents (Chopra et al. 2019). Another study using a platform called mClerk
for mobile crowdsourcing in developing regions demonstrated that mClerk can be effectively used
to digitize local‑language documents (Gupta et al. 2012).
 
  • #10
jack action said:
TL;DR Summary: AI is powered by hundreds of millions of exploited data workers in the Global South.

er point of view is that AI systems will never become sufficient and will always rely on data workers to feed them more data because things are always evolving, thus new data aways need to be examined. I thought it was a very interesting point.
Perhaps not that AI systems will never become 100% sufficient, but the reasons for this to be so:
1. the datasets available for training are always wanting.
2. computational power is a limited resource.

There are 2 basic types of AI - hardwired AI and learned AI.

Take the simplest dataset that has human implications - the English character set ( that has been used for character recognition over several decades and quite mature )

Starting with a test scan of a series of crisp, high contrast black-white characters of a particular font through an AI system modeled on of the same font, the AI system should be able to be nearly 100% of the time in picking out characters correctly.

Add into the scanned test 'paragraph' alterations such as faded characters, more greyscale, color, dropped slashes and curves of the characters , merged characters due to mechanical slop in printing, the success rate drops drastically with the AI trained on crisp, high contrast black-white character set.

To achieve a higher success, the training character set has to include features of these alterations. With that comes a larger training dataset, as well as greater computational power needed to process the dataset.

Add in extra fonts, character sets from other languages, handwritten characters. The permutations and combinations become enormous.

Hard Wired AI Model
Needs the whole dataset to achieve reliability. An example is the old chat model from the 1980's ?? question and answer. All questions and answers are hardcoded into the chat. The chat follows from some sort of coding of an if.. then.. else.. algorithm to keep the conversation moving along. The conversation is limited due to the limited size of the dataset - ie physical restraints prevent the whole dataset being available.
For the character set, the training data would need to include all characters , along with all features, for thus type of AI model to be reliable to some degree - approaches 100% on the dataset, but falters to 0% on characters not in the dataset ).

AI Learned Model
A subset of the whole possible combinations and permutations of the whole training set is used to ''train' the AI model. For example, a character looks the same in one font as another, as well as the same including features. Using probabilities, and some extrapolation, the AI is able to recognize characters not included in the original training set. Nevertheless 100% reliability ( more like 96% or so as a general rule on a decent system ) is unachievable due to the probabilistic approach. ie is that an 'o' or an 'a', but the AI can be trained to make a correct guess better than 50% of the time.
 
  • #11
jack action said:
I think this is the one referred to
I don't think much explanation is needed how different that is from the claim of 'AI is powered by hundreds of millions of exploited data workers in the Global South.'.

On the other hand, where AI will hit the hardest is:
1759858990932.webp

All those areas will feel the effect of AI really hard. So by my opinion it's not about whether this is sustainable, but rather: what will happen when this job market collapses?
 
Last edited:
  • #12
jack action said:
The point of Uma Rani is that the San Francisco environment will change over time, thus the AI training will never stop
Yes, updates would be needed.
But I consider her comment to be a misrepresentation, or not complete.
She is not making a distinction between the Learned AI model and the Hard Wired Model as the basic for her comment, or the interviewer is cutting her off and guiding her incorrectly.
**** See below****

I am assuming she is talking about the physical layout of SF, rather than the climate.

The Hard Wired Model needs continuous updates. Change a feature, such as a tree, a house, fire hydrant, signage, on a SF street and the street becomes completely new. Somehow this has to be incorporated into this AI model with new training of visual data, or cross reference to a database of street name - street view. Very difficult to jeep the model up to date without lots of 'data entry'.

The Learned AI Model does not have this problem. Quite possibly, the street can change drastically, ( maybe to a low of 10% of the original visual image depending upon prominent features ), and be correctly designated, similar to how a human can pick out changed landscapes.

This is one of the reasons the AI industry has shifted away from the Hard Wired coded model to the Learned model, in that the Learned model is more adaptable to situations it has not before encountered nor trained on. ( Like humans )

Upfront, the Learned model and the Hard Coded model may be comparable as to the amount of 'data entry' required in the testing phase. After that, with functioning AI systems, the Learned model requires less maintenance than the Hard Wired model.

*** ****
The reason for large data centres for the AI Leaned model is not for continuous updates of an existing, but rather an attempt to approach 100% reliability as close as possible.
More testing data --> more reliability --> more data entry-->more testing--> endless loop ...

Or at least, skimming data from the internet and other sources, reformulating for testing, cataloging yes/no responses from testers while training the AI, all require people. It all sounds automatic but isn't.

Running out of data is the bane of the AI, and now perhaps running out of people as they mention can become another bottleneck.
 
  • #13
Rive said:
I don't think much explanation is needed how different that is from the claim of 'AI is powered by hundreds of millions of exploited data workers in the Global South.'.
but rather: what will happen when this job market collapses

I don't know what that means, honestly.
Is it a generational changeover timespan like all the other work revolutions.
Agricultural Society --> industrial -->energy --> service --> digitized --> AI'd

Example of sector with lost jobs
Years ago, every bank had a data centre with people entering into the computer the transactional information from individuals and companies. The paper trail had to be complete, checked and verified as to who owes who what. Over the course of travels of say a check written for a purchase to ensure the correct debtor and creditor accounts from the particular branch reflects the actual seller and buyer. Digitization of money eliminated a lot of these jobs, as you yourself became the data person either as performing the actual data entry, or as verifier when using a pin card ( debit card ), and you are doing for free doing what someone else previously had done for wages.
 

Similar threads

Replies
13
Views
8K
Replies
5
Views
3K
Replies
65
Views
10K
Back
Top