Can AI Truly Understand This Simple Image?

  • Thread starter Thread starter jack action
  • Start date Start date
  • Tags Tags
    Ai Database
Click For Summary

Discussion Overview

The discussion revolves around the capabilities of AI in understanding and interpreting a simple image that combines visual elements and text. Participants explore whether AI can explain the image's meaning without prior context, and they reflect on examples of tasks that are easy for humans but challenging for AI.

Discussion Character

  • Exploratory
  • Debate/contested
  • Technical explanation

Main Points Raised

  • Some participants question whether AI can explain the meaning of an image without being provided additional context, emphasizing the complexity of human understanding.
  • Others argue that advanced AI could potentially learn to recognize patterns and meanings in images, suggesting that the challenge lies in the AI's ability to understand context and creativity.
  • A participant references Andrej Karpathy's claims about AI's ability to explain humor in images, although they express difficulty in finding supporting evidence for this claim.
  • Concerns are raised about the validity of AI evaluations if the image or similar examples were part of the training data, which could affect the outcome of tests.
  • Some participants discuss the statistical nature of AI decision-making and how training data can influence the model's ability to recognize specific inputs over time.
  • There is a consideration of whether an AI model can remember specific training examples or if they become diluted among other data inputs.

Areas of Agreement / Disagreement

Participants express differing views on the capabilities of AI regarding image interpretation. While some believe that AI can eventually learn to understand such images, others highlight the inherent challenges and limitations in achieving this level of comprehension.

Contextual Notes

Participants note the complexity of the image's layers, the necessity of understanding context, and the potential for training data to influence AI performance. There is also mention of the difficulty in verifying AI outputs against training data.

Who May Find This Useful

This discussion may be of interest to those exploring AI capabilities, image recognition, and the philosophical implications of machine understanding in comparison to human cognition.

jack action
Science Advisor
Insights Author
Messages
3,558
Reaction score
9,920
I found this image on social media today:

sun-on-the-beach.jpg

This image is so simple, yet out of the ordinary. You really need to stop and think about what you are looking at. Some might not even ever get it.

The question that quickly popped into my mind was: Can AI ever be able to explain what this image represents, without being fed anything else? What amazes me is how I can even do it! I failed to see how a bunch of statistical analysis can do the trick. And if it is possible, the database has to be really huge and diversified.

Have you seen other examples of seemingly simple tasks for humans that seem impossible for AI? Something that doesn't obey any known rules per se but humans can figure it out rather easily nonetheless.
 
  • Like
Likes   Reactions: FactChecker
Computer science news on Phys.org
That doesn't seem that hard to me for an advance AI engine. It just has to recognize names of things in photos, that word order matters (even vertically), and that you can substitute symbols for words (like when people use emojis now). It then has to sort through the choices to find something makes sense to humans; like 'you sun of a wave' isn't as meaningful as other choices. There are simple examples already of each of the pieces. This all seems very trainable if people cared to do it. Maybe they're not there yet, but they will be. Maybe the hardest part is learning that people might want it done at all (i.e. self-learning), or maybe the creativity to make the first example of this sort of thing.
 
  • Skeptical
Likes   Reactions: jack action
Andrej Karpathy gave an example of that in 2012 and earlier this year he reported that ChatGPT-4 was able to explain why that picture is funny (I believe I read that in Arstechnica). However, I now fail to find any posting regarding this claim and are only able to find this discussion on reddit. But I guess anyone with ChatGPT-4 access could give it a try.
 
  • Like
Likes   Reactions: jack action
  • First, this image has two layers: one image of a sunset and another of text;
  • Then you have to understand the text is incomplete and it must be a joke;
  • Then you have to understand that the text location matters;
  • Then you must understand that the background image will complete the text;
  • Then you must understand that the part of the image that can replace a word sounds like the word it replaces (not even a true homophone in one case);
  • You most likely had to have heard the sentence before;
The last word (beach/b-i-tch) is really hard to get. I got it because I knew the sentence and I was looking for the word, and I found it by looking at the left of the image where the sandy beach is more prominent.

I'm not talking about asking AI "What is the joke?" or "Find the hidden text in this image"; Just asking "What does that image represent?" All of that without answering "A sunset on the beach with the words 'YOU OF A'".
 
  • Like
Likes   Reactions: DaveE and FactChecker
Filip Larsen said:
However, I now fail to find any posting regarding this claim and are only able to find this discussion on reddit.
There is one obvious explanation in the comments of that discussion:
Karpathy said there is a risk that the image (or a derivative thereof) was part of the training data, which would to some extent invalidate the test.
 
Yes, but it seems strange (to me, at least) that one training sample can be retrieved "verbatim" when given enough context. But the point is valid in general that you can't verify a network by using training data.
 
jack action said:
There is one obvious explanation in the comments of that discussion:
That beings a question to my mind. If a neural network is asked to evaluate an example that was used as a training input, is it guaranteed to remember it? Could it get watered down by the other training inputs and maybe even get treated like an outlier?
 
FactChecker said:
Could it get watered down by the other training inputs and maybe even get treated like an outlier?
Yes. In its simplest form, an AI model looks like this:
neural-net-classifier.png

The decision is not strictly A, B, or C with 100% certainty. The choices are always statistical. When training is first started, all of the weights in the hidden layer have randomly assigned values and the output would be statistical nonsense. If an 'outlier' is the first and only one that it is trained on, the backpropagation algorithm that is used will generate weight values in the hidden layer such that it would generate an output for that input with near 100% certainty.

As training progresses with additional inputs, the hidden layer's weights are continuously adjusted to create a best fit for everything that it's been trained on in order to attempt to get the correct output for every input that it's been trained on. This will naturally cause the first training items to shift away from 100%. If it's a big enough outlier from other items of its output type, the model could eventually classify it as something else.

Note however that with good test data, you can eliminate most of these types of error misclassifications. For example, in the standard MNIST number dataset, it's pretty easy to get a model 99.5% accuracy on identifying hand-written digits. And, if you look at the ones that it gets wrong, you would often have a hard time telling what the number was.
 
Last edited:

Similar threads

Replies
10
Views
5K
Replies
3
Views
3K
  • · Replies 21 ·
Replies
21
Views
3K
  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 179 ·
6
Replies
179
Views
28K
Replies
7
Views
6K
  • · Replies 3 ·
Replies
3
Views
3K
Replies
8
Views
5K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 51 ·
2
Replies
51
Views
8K