Can AI Decode This Simple Image? Exploring the Limits of Artificial Intelligence

  • #1
jack action
Science Advisor
Insights Author
Gold Member
2023 Award
3,106
7,678
I found this image on social media today:

sun-on-the-beach.jpg

This image is so simple, yet out of the ordinary. You really need to stop and think about what you are looking at. Some might not even ever get it.

The question that quickly popped into my mind was: Can AI ever be able to explain what this image represents, without being fed anything else? What amazes me is how I can even do it! I failed to see how a bunch of statistical analysis can do the trick. And if it is possible, the database has to be really huge and diversified.

Have you seen other examples of seemingly simple tasks for humans that seem impossible for AI? Something that doesn't obey any known rules per se but humans can figure it out rather easily nonetheless.
 
  • Like
Likes FactChecker
Computer science news on Phys.org
  • #2
That doesn't seem that hard to me for an advance AI engine. It just has to recognize names of things in photos, that word order matters (even vertically), and that you can substitute symbols for words (like when people use emojis now). It then has to sort through the choices to find something makes sense to humans; like 'you sun of a wave' isn't as meaningful as other choices. There are simple examples already of each of the pieces. This all seems very trainable if people cared to do it. Maybe they're not there yet, but they will be. Maybe the hardest part is learning that people might want it done at all (i.e. self-learning), or maybe the creativity to make the first example of this sort of thing.
 
  • Skeptical
Likes jack action
  • #3
Andrej Karpathy gave an example of that in 2012 and earlier this year he reported that ChatGPT-4 was able to explain why that picture is funny (I believe I read that in Arstechnica). However, I now fail to find any posting regarding this claim and are only able to find this discussion on reddit. But I guess anyone with ChatGPT-4 access could give it a try.
 
  • Like
Likes jack action
  • #4
  • First, this image has two layers: one image of a sunset and another of text;
  • Then you have to understand the text is incomplete and it must be a joke;
  • Then you have to understand that the text location matters;
  • Then you must understand that the background image will complete the text;
  • Then you must understand that the part of the image that can replace a word sounds like the word it replaces (not even a true homophone in one case);
  • You most likely had to have heard the sentence before;
The last word (beach/b-i-tch) is really hard to get. I got it because I knew the sentence and I was looking for the word, and I found it by looking at the left of the image where the sandy beach is more prominent.

I'm not talking about asking AI "What is the joke?" or "Find the hidden text in this image"; Just asking "What does that image represent?" All of that without answering "A sunset on the beach with the words 'YOU OF A'".
 
  • Like
Likes DaveE and FactChecker
  • #5
Filip Larsen said:
However, I now fail to find any posting regarding this claim and are only able to find this discussion on reddit.
There is one obvious explanation in the comments of that discussion:
Karpathy said there is a risk that the image (or a derivative thereof) was part of the training data, which would to some extent invalidate the test.
 
  • #6
Yes, but it seems strange (to me, at least) that one training sample can be retrieved "verbatim" when given enough context. But the point is valid in general that you can't verify a network by using training data.
 
  • #7
jack action said:
There is one obvious explanation in the comments of that discussion:
That beings a question to my mind. If a neural network is asked to evaluate an example that was used as a training input, is it guaranteed to remember it? Could it get watered down by the other training inputs and maybe even get treated like an outlier?
 
  • #8
FactChecker said:
Could it get watered down by the other training inputs and maybe even get treated like an outlier?
Yes. In its simplest form, an AI model looks like this:
neural-net-classifier.png

The decision is not strictly A, B, or C with 100% certainty. The choices are always statistical. When training is first started, all of the weights in the hidden layer have randomly assigned values and the output would be statistical nonsense. If an 'outlier' is the first and only one that it is trained on, the backpropagation algorithm that is used will generate weight values in the hidden layer such that it would generate an output for that input with near 100% certainty.

As training progresses with additional inputs, the hidden layer's weights are continuously adjusted to create a best fit for everything that it's been trained on in order to attempt to get the correct output for every input that it's been trained on. This will naturally cause the first training items to shift away from 100%. If it's a big enough outlier from other items of its output type, the model could eventually classify it as something else.

Note however that with good test data, you can eliminate most of these types of error misclassifications. For example, in the standard MNIST number dataset, it's pretty easy to get a model 99.5% accuracy on identifying hand-written digits. And, if you look at the ones that it gets wrong, you would often have a hard time telling what the number was.
 
Last edited:

1. What does it mean for AI to decode a simple image?

Decoding a simple image with AI involves interpreting or extracting meaningful information from the image using artificial intelligence technologies. This process typically includes recognizing shapes, detecting objects, and understanding the context of the image. AI systems use algorithms and models trained on large datasets to perform these tasks.

2. What are the limits of AI in decoding images?

The limits of AI in decoding images largely depend on the complexity of the image, the quality of the data, and the sophistication of the AI model used. Challenges include dealing with poor image quality, ambiguous or complex scenes, and the AI's ability to generalize from its training data to new, unseen scenarios. Additionally, AI may struggle with images that require a deep understanding of cultural contexts or abstract concepts.

3. How does AI technology interpret images differently from humans?

AI interprets images based on patterns and data it has been trained on, unlike humans who use cognitive abilities and contextual understanding. AI lacks intuition and emotional context, which often play a significant role in human image interpretation. Consequently, while AI can outperform humans in speed and accuracy for specific tasks, it may miss subtleties or interpret ambiguous images differently than a human would.

4. What advancements have been made in AI image decoding?

Significant advancements in AI image decoding include the development of convolutional neural networks (CNNs) and deep learning frameworks that have dramatically improved the accuracy of object detection, image classification, and facial recognition. Other advancements include improvements in semantic segmentation, image generation, and the ability to integrate contextual information into image analysis.

5. Can AI decode any image?

No, AI cannot decode any image with complete accuracy. While AI has become exceptionally good at interpreting a wide range of images, its performance can be limited by factors such as training data diversity, image quality, and the complexity of the content within the images. Images that are too abstract or lack clear patterns can also pose significant challenges for AI decoding.

Similar threads

Replies
10
Views
2K
  • General Discussion
Replies
9
Views
1K
Replies
179
Views
23K
Replies
4
Views
1K
Replies
80
Views
16K
Writing: Input Wanted Captain's choices on colony ships
  • Sci-Fi Writing and World Building
Replies
4
Views
2K
  • Classical Physics
Replies
21
Views
1K
  • STEM Academic Advising
Replies
7
Views
880
Replies
8
Views
4K
  • Science Fiction and Fantasy Media
Replies
2
Views
3K
Back
Top