Insights Why ChatGPT AI Is Not Reliable

  • Thread starter Thread starter PeterDonis
  • Start date Start date
  • Tags Tags
    chatgpt
Click For Summary
ChatGPT is deemed unreliable because it generates text based solely on word frequencies from its training data, lacking true understanding or semantic connections. Critics argue that it does not accurately answer questions or provide reliable information, often producing confident but incorrect responses. While some users report that it can parse complex code and suggest optimizations, this does not equate to genuine knowledge or reasoning. The discussion highlights concerns about its potential impact on how society perceives knowledge and the importance of critical evaluation of AI-generated content. Ultimately, while ChatGPT may appear impressive, its limitations necessitate cautious use and independent verification of information.
  • #61
PeterDonis said:
I suspect that is because if they did do so, interest in what OpenAI is doing would evaporate.
I don't see why. Most people care about the result. Of course it has some limitations that are fundamental, and they don't necessarily want people knowing that. But still, it's not like even as it stands it's not going to be used a ton, for better or for worse. For what it's worth, it helped me write a python script and customize Vim very effectively. It can also give you sources and guidelines for problems, with various degrees of effectiveness.
 
Computer science news on Phys.org
  • #62
AndreasC said:
Most people care about the result. Of course it has some limitations that are fundamental, and they don't necessarily want people knowing that.
You're contradicting yourself. The "limitations that are fundamental" are crucial effects on the result. They're not just irrelevant side issues.
 
  • #63
PeterDonis said:
You're contradicting yourself. The "limitations that are fundamental" are crucial effects on the result. They're not just irrelevant side issues.
There are fundamental limitations that put a limit to how much the technology can improve. This doesn't mean that it won't get good enough for the purposes of many people. In fact it already is for lots of them.

Although tbh I'm kind of rethinking how fundamental these limitations are after I saw the performance of recent LLMs. I definitely didn't expect them to get this far yet. Perhaps the ceiling is a bit higher than I thought.
 
  • #64
I do agree with @Vanadium 50 (if he wasn't kidding) that it has good use cases for low risk, low expectation purposes like customer service bots, but that's a really low performance bar*. I do agree with @PeterDonis that if, for example, this was rolled-out by Apple as an upgrade to Siri we wouldn't be having this conversation. It's way, way less interesting/important than the hype suggests.

....and this Insight addresses an important but not well discussed problem that more to the point is why we frown upon chat-bot questions and answers on PF.

*Edit: Also, this isn't what AI is "for". AI's promise is in being able to solve problems that are currently out of reach of computers but don't even require conscious thought by people. These problems - such as self-driving cars - are often ones where reliability is important.

edit2: Ok, I say that, but I can't be so sure it's true, particularly because of wildcards like Elon Musk who are eager willing to put the public at risk to test experimental software.
 
Last edited:
  • #65
First, I was serious. And stop calling me Shirley.

Second. the problem with discussing "AI", much less its purpose, is that it is such a huge area, lumping it all together is seldom helpful. Personally I feel that the most interesting work has been done in motion, balance and sensors.

Third, we had this technology almost 40 years ago. That was based on letters, not words, and it was much slower than real-time. And nobody got excited.
 
  • Like
Likes physicsworks, russ_watters and berkeman
  • #66
Vanadium 50 said:
Third, we had this technology almost 40 years ago.
We didn't. Because it was not possible at the time to train models this complex, with this much data. There was neither enough data, nor computational power. No wonder nobody got excited! I played with stuff like GPT-2 some time ago, even that was complete trash compared to ChatGPT.
 
  • #67
@AndreasC , I was doing it 40 years ago.
 
  • Like
Likes PeterDonis
  • #68
Vanadium 50 said:
@AndreasC , I was doing it 40 years ago.
Sure, you could train language models 40 years ago. Just like you could make computers back then. Except they couldn't do nearly as much as modern ones.
 
  • #69
If you want to argue that the difference between then and now is that hardware has gotten cheaper, you should argue that. But the ideas themselves are old. As I said, I was there.
 
  • #70
Vanadium 50 said:
If you want to argue that the difference between then and now is that hardware has gotten cheaper, you should argue that. But the ideas themselves are old. As I said, I was there.
Not just hardware. Also the data available.
 
  • #71
That's just a statement that you can pre-train your program on a large number of questions. I've already said it was much slower than real time. It doesn't make any difference to what the program does. It does, however, make a difference to the illusion of intelligence,.

As discussed, ChatGPT doesn't even try to output what is correct. It tries to output what is written often. There is some home that there is a correlation between that and correctness, but that's not always true and it was not hard to come up with examples.

ChatGPT is the love child of Clever Hans and the Mechanical Turk.
 
  • Like
  • Haha
  • Love
Likes physicsworks, PeterDonis and nsaspook
  • #72
> As discussed, ChatGPT doesn't even try to output what is correct.

Exactly. It also tries to err on the side of providing an answer, even when it has no idea what the right answer is. I used Stable Diffusion to generate pictures of composite animals that don't exist, then asked ChatGPT multiple times to identify them. The AI *never*, not even once said, "I can't identify that" or "I don't know what that is," nor did it suspect that it wasn't a real animal. It's guesses were at least related to the broad class of animal the composites resembled, but that was it.

There is no there, there.
 
  • Like
Likes dextercioby, russ_watters and AndreasC
  • #73
Oscar Benavides said:
It also tries to err on the side of providing an answer
It doesn't even "try"--it will always output text in response to a prompt.

Oscar Benavides said:
even when it has no idea what the right answer is
It never does, since it has no "idea" of any content at all. All it has any "idea" of is relative word frequencies.
 
  • Like
Likes Math100, Motore, Vanadium 50 and 3 others
  • #74
I'm not even sure how you could measure uncertainty in the output based on word frequency. "Some people say Aristotle was Beligian" will throw it off.
 
  • Like
Likes Oscar Benavides
  • #75
I tried using it a couple of times and for me it is really not usefull. For complex code, I found out it's faster to go to Stack Overflow, because there I get more understanding of the code beside the code itself.
The only thing that is really good at is writing language based question (write me a song, interpret this and this text, an email, ...) which some people will find usefull.
For research or factual questions it's to unreliable. It's just faster to use Wiki.
 
  • Like
Likes weirdoguy
  • #76
I know someone who has the paid version and says it's a lot more reliable. Previously, using the free version, a request for scientific references on a topic produced 40 authentic looking but completely unreal references. The paid version produced real references that all checked out.
 
  • #77
bob012345 said:
I know someone who has the paid version and says it's a lot more reliable.
Is there any reference online about this paid version and how it differs from the free version?
 
  • Like
Likes russ_watters
  • #79
bob012345 said:
Thanks! It looks like, at the very least, the paid version includes searching the Internet for actual answers to prompts, so it is not the same thing as the free version that my Insights article (and the Wolfram article it references) discuss.
 
  • Like
Likes Math100, russ_watters and bob012345
  • #80
OpenAI explain thd differences between ChatGPT 3, 3.5 and 4 (and indicate the plans and timeline for 5) on their website.
 
  • #81
AndreasC said:
How do we know at what point it "knows" something? There are non-trivial philosophical questions here... These networks are getting so vast and their training so advanced that I can see someone eventually arguing they have somehow formed a decent representation of what things "are" inside them. [...]

I think that's the point exactly. At some point we'll be unable to tell the difference, and the person who calls you trying to convince you to change your phone company, electricity company or whatever, might be a machine. But if you can't tell the difference than what is the difference?!

---------------------------------------------------------------

Filip Larsen said:
Stochastic parrot. Hah! Very apt.
russ_waters said:
Maybe the intent was always to profit from 3rd parties using it as an interface [...]

pbuk said:
Ya think?[...]

And then we enter the land of sarcasm. :)

---------------------------------------------------------------

This ChatGPT thingy really gets people riled up. I suspect especially the teaching part of the community here. ;P

... still reading....
 
  • #82
What it means "to know" is philosophy.

However, an epistomologist would say that an envelope that contaiend the phrase "It is after 2:30 and before 2:00" does not posess knowledgem eve though it is correct about as often as ChatGPT.
 
  • Like
Likes Bystander
  • #83
I'm not convinced that human intelligence is so effective. This site in many ways is a gross misrepresentation of human thought and interactions. For all the right reasons! Go anywhere else on the Internet or out in the street, as it were, and there is little or no connection between what people think and believe and objective evidence.

Chat GPT, if anything, is more reliable in terms of its objective assessment of the world than the vast majority of human beings.

Chat GPT doesn't have gross political, religious or philosophical prejudices.

If you talked to an Oil Company Executive, then there was no climate change and the biggest threat to humanity was the environmental movement.

Most humans beings deliberately lie if it is in their interests. With Chat GPT at least you know it isn't deliberately lying to you.

I don't know where AI is going, or where we are heading, but I could make a case that Chat GPT is more rational, intelligent and truthful than 99% of the people on this planet.
 
  • Skeptical
  • Like
Likes mattt and Bystander
  • #84
PeroK said:
Chat GPT, if anything, is more reliable in terms of its objective assessment of the world
ChatGPT does not have any "objective assessment of the world". All it has is the relative word frequencies in its training data.

Wolfram Alpha, ironically, would be a much better thing to describe with the phrase you use here. It actually does contain a database (more precisely multiple databases with different entry and lookup criteria) with validated information about the world, which it uses to answer questions.

PeroK said:
Chat GPT doesn't have gross political, religious or philosophical prejudices.
Only for the same reason a rock doesn't.
 
  • Like
  • Skeptical
  • Haha
Likes dextercioby, Bystander, russ_watters and 3 others
  • #85
PeterDonis said:
ChatGPT does not have any "objective assessment of the world". All it has is the relative word frequencies in its training data.

Wolfram Alpha, ironically, would be a much better thing to describe with the phrase you use here. It actually does contain a database (more precisely multiple databases with different entry and lookup criteria) with validated information about the world, which it uses to answer questions.Only for the same reason a rock doesn't.
In a practical sense, you could live according to what answers ChatGPT gives you. Wolfram Alpha is a mathematical engine. It's not able to communicate on practical everyday matters. Nor can a rock.

How any software works is not really the issue if you are an end user. The important thing is what it outputs.

You are too focused, IMO, on how it does things and not what it does.
 
  • Skeptical
Likes Motore
  • #86
PeroK said:
In a practical sense, you could live according to what answers ChatGPT gives you.
For your sake I sincerely hope you don't try this. Unless, of course, you only ask it questions whose answers you don't really care about anyway and aren't going to use to determine any actions. Particularly any actions that involve risk of harm to you or others.

PeroK said:
Wolfram Alpha is a mathematical engine. It's not able to communicate on practical everyday matters.
Sure it is. You can ask it questions in natural language about everyday matters and it gives you answers, if the answers are in its databases. Unlike ChatGPT, it "knows" when it doesn't know an answer and tells you so. ChatGPT doesn't even have the concept of "doesn't know", because it doesn't even have the concept of "know". All it has is the relative word frequencies in its training data, and all it does is produce a "continuation" of the text you give it as input, according to those relative word frequencies.

Granted, Wolfram Alpha doesn't communicate its answers in natural language, but the answers are still understandable. Plus, it also includes in its answers the assumptions it made while parsing your natural language input (which ChatGPT doesn't even do at all--not just that it doesn't include any assumptions in its output, but it doesn't even parse its input). For example, if you ask Wolfram Alpha "what is the distance from New York to Los Angeles", it includes in its answer that it assumed that by "New York" you meant the city, not the state.

PeroK said:
You are too focused, IMO, on how it does things and not what it does.
Huh? The Insights article under discussion, and the Wolfram article it references, are entirely about what ChatGPT does, and what it doesn't do. Wolfram also goes into some detail about the "how", but the "what" is the key part I focused on.
 
  • #87
PeroK said:
You are too focused, IMO, on how it does things and not what it does.
Could you make the same argument for astrology? Yesterday it told me to talk to a loved one and it worked!
 
Last edited:
  • Like
Likes PeterDonis, pbuk, dextercioby and 1 other person
  • #88
PeterDonis said:
For your sake I sincerely hope you don't try this. Unless, of course, you only ask it questions whose answers you don't really care about anyway and aren't going to use to determine any actions.
I don't personally intend to, no. But, there are worse ways to get answers.
 
  • #89
PeroK said:
there are worse ways to get answers.
So what? That doesn't make ChatGPT good enough to rely on.
 
  • Like
Likes Motore
  • #90
PeroK said:
I don't personally intend to, no.
Doesn't that contradict your previous claim here?

PeroK said:
In a practical sense, you could live according to what answers ChatGPT gives you.
If you're not willing to do this yourself, on what basis do you justify saying that someone else could do it?
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
3K
Replies
10
Views
4K
  • · Replies 39 ·
2
Replies
39
Views
9K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 25 ·
Replies
25
Views
5K
Replies
3
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
3
Views
5K
  • · Replies 1 ·
Replies
1
Views
26K