Insights Why ChatGPT AI Is Not Reliable

  • Thread starter Thread starter PeterDonis
  • Start date Start date
  • Tags Tags
    chatgpt
AI Thread Summary
ChatGPT is deemed unreliable because it generates text based solely on word frequencies from its training data, lacking true understanding or semantic connections. Critics argue that it does not accurately answer questions or provide reliable information, often producing confident but incorrect responses. While some users report that it can parse complex code and suggest optimizations, this does not equate to genuine knowledge or reasoning. The discussion highlights concerns about its potential impact on how society perceives knowledge and the importance of critical evaluation of AI-generated content. Ultimately, while ChatGPT may appear impressive, its limitations necessitate cautious use and independent verification of information.
  • #51
If there were any knowledge base behind ChatGPT you would be able to
  1. Train it in English
  2. Train it in French
  3. Train it in domain knowledge (like physics)
  4. Have it answer questions about thus domain in French.
It can't do this. There is no there there.
 
Computer science news on Phys.org
  • #52
Vanadium 50 said:
"ChatGPT Airlines - now 96% of our takeoffs have landings at airports!"
"New from OceanGate: now 99% Reliable - Twice as Reliable as our Previous Subs!"
(too soon?)

Vanadium 50 said:
It's not just unreliable - we have no reason to believe it should be reliable, or that this approach will ever be reliable.
I go back again to wondering what the creators are thinking about this...
pbuk said:
Definitely not [AI], but they believe they are headed in the right direction:
OpenAI's website is really weird. It is exceptionally thin on content and heavy on flash, with most of the front page just being pointless slogans and photos of people doing office things (was it created by ChatGPT?). It even features a video on top that apparently has no sound? All this to sell a predominantly text-based application (ironic)? The first section of the front page, though, contains one actual piece of information, in slogan form:

"Creating safe AGI that benefits all of humanity"​

That's quite an ambitious goal/claim. It's not surprising that everyday people believe it's more than it really is, when that's what the company is saying.

The trajectory of the app and the way they've talked about the flaws such as hallucinations does imply they think their approach is viable and that refinements that improve its reliability should result in it becoming "reliable enough". Ironically this may increase the risk/danger of misuse, as people apply it to more and more situations where reliability should matter. I can't see how this approach would ever be acceptable for industrial automation. Maybe for a toy drone it won't matter if it unexpectedly/unpredictably crashes for no apparent reason "only" 0.1% of the time, but that won't ever be acceptable for a self driving car or airplane.
 
  • Haha
  • Like
Likes DaveC426913 and PeterDonis
  • #53
PeterDonis said:
Or because the testers didn't bother writing a good test, that actually can distinguish between ChatGPT, an algorithm that generates text based on nothing but relative word frequencies in its training data, and an actual human with actual human understanding of the subject matter
If that's what a "good" test is, then it is tautologically true that GPT would be no good at them. The issue with tautologies is, of course, that they don't tell us anything new. What is new is that GPT can do many things that only humans with understanding could previously do. Of course it doesn't do them perfectly, but often it does them more accurately than most humans, and much faster. If what you want is the answer to an exercise, and it can give you the correct answer, say, 99% of the time, then that's good enough for many people and in many contexts, regardless of philosophical questions about understanding. And again, we are talking about things that computers previously just couldn't do. This is why it is significant and this is why I'm saying it should not be downplayed, because we will encounter this way too much in coming years.
 
  • Skeptical
Likes weirdoguy
  • #54
PeterDonis said:
What you show here is nothing like what AndreasC described.
Well, @Demystifier didn't do what I described. See my post where I tried it.
 
  • #55
russ_watters said:
go back again to wondering what the creators are thinking about this...
I think they are planning to monetize this by first making a name for themselves and then selling a product where "close enough is good enough". For example, customer service chatbots.
 
  • Like
Likes russ_watters
  • #56
AndreasC said:
If what you want is the answer to an exercise, and it can give you the correct answer, say, 99% of the time, then that's good enough for many people and in many contexts
Is it?

Perhaps if my only purpose is to get a passing grade on the exercise, by hook or by crook, this would be good enough.

But for lots of other purposes, it seems wrong. It's not even a matter of percentage accuracy; it's a matter of what the thing is doing and not doing, as compared with what my purpose is. If my purpose is to actually understand the subject matter, I need to learn from a source that actually understands the subject matter. If my purpose is to learn a particular fact, I need to learn from a source that will respond based on that particular fact. For example, if I ask for the distance from New York to Chicago, I don't want an answer from a source that will generate text based on word frequencies in its input data; I want an answer from a source that will look up that distance in a database of verified distances and output what it finds. (Wolfram Alpha, for example, does this in response to queries of that sort.)
 
  • #57
PeterDonis said:
Is it?

Perhaps if my only purpose is to get a passing grade on the exercise, by hook or by crook, this would be good enough.

But for lots of other purposes, it seems wrong. It's not even a matter of percentage accuracy; it's a matter of what the thing is doing and not doing, as compared with what my purpose is. If my purpose is to actually understand the subject matter, I need to learn from a source that actually understands the subject matter. If my purpose is to learn a particular fact, I need to learn from a source that will respond based on that particular fact. For example, if I ask for the distance from New York to Chicago, I don't want an answer from a source that will generate text based on word frequencies in its input data; I want an answer from a source that will look up that distance in a database of verified distances and output what it finds. (Wolfram Alpha, for example, does this in response to queries of that sort.)
But what if you want the answer as if given by Homer Simpson, or a Shakespearian Sonnet? Alpha cant do that ;)

I think many are missing the point in that applications with near perfect accuracy are not the objective - LLMs can write marketing pitches, legal boilerplate, informational articles, etc. just as well as a junior employee whose work would also need to be checked for accuracy.

Informative that the largest quant hedge funds took these tools not for trading, but to automate the tasks of junior analysts:

https://fortune.com/2023/06/01/hedge-fund-chatgpt-grunt-work-mundane/
 
  • Like
Likes russ_watters
  • #58
PeterDonis said:
Perhaps if my only purpose is to get a passing grade on the exercise, by hook or by crook, this would be good enough.
Exactly, here is the problem!

Or what happens when some business or gover does the math and figures it would rather risk being wrong than pay experts?

On the other hand, it could work very productively if it is used to provide guidelines to solving something, or even giving the answer and then curating the output. Terence Tao talked about this if you want to read his experiences with that.

The flip side of this is that researchers could use it to churn out absurd quantities of research papers that are mostly junk or mostly uninteresting to inflate their publications.

There are lots of ramifications this new technology could have.
 
  • #59
BWV said:
But what if you want the answer as if given by Homer Simpson, or a Shakespearian Sonnet? Alpha cant do that ;)
I think they already do it the Max Power way:
 
  • Haha
Likes BWV
  • #60
AndreasC said:
what happens when some business or gover does the math and figures it would rather risk being wrong than pay experts?
If I know that's what your business is doing, you won't get my business.

I suspect that a lot of people feel this way; they just don't know that that's what the business is doing. Certainly OpenAI has not done anything to inform the public of what ChatGPT is actually doing, and not doing. I suspect that is because if they did do so, interest in what OpenAI is doing would evaporate.
 
  • #61
PeterDonis said:
I suspect that is because if they did do so, interest in what OpenAI is doing would evaporate.
I don't see why. Most people care about the result. Of course it has some limitations that are fundamental, and they don't necessarily want people knowing that. But still, it's not like even as it stands it's not going to be used a ton, for better or for worse. For what it's worth, it helped me write a python script and customize Vim very effectively. It can also give you sources and guidelines for problems, with various degrees of effectiveness.
 
  • #62
AndreasC said:
Most people care about the result. Of course it has some limitations that are fundamental, and they don't necessarily want people knowing that.
You're contradicting yourself. The "limitations that are fundamental" are crucial effects on the result. They're not just irrelevant side issues.
 
  • #63
PeterDonis said:
You're contradicting yourself. The "limitations that are fundamental" are crucial effects on the result. They're not just irrelevant side issues.
There are fundamental limitations that put a limit to how much the technology can improve. This doesn't mean that it won't get good enough for the purposes of many people. In fact it already is for lots of them.

Although tbh I'm kind of rethinking how fundamental these limitations are after I saw the performance of recent LLMs. I definitely didn't expect them to get this far yet. Perhaps the ceiling is a bit higher than I thought.
 
  • #64
I do agree with @Vanadium 50 (if he wasn't kidding) that it has good use cases for low risk, low expectation purposes like customer service bots, but that's a really low performance bar*. I do agree with @PeterDonis that if, for example, this was rolled-out by Apple as an upgrade to Siri we wouldn't be having this conversation. It's way, way less interesting/important than the hype suggests.

....and this Insight addresses an important but not well discussed problem that more to the point is why we frown upon chat-bot questions and answers on PF.

*Edit: Also, this isn't what AI is "for". AI's promise is in being able to solve problems that are currently out of reach of computers but don't even require conscious thought by people. These problems - such as self-driving cars - are often ones where reliability is important.

edit2: Ok, I say that, but I can't be so sure it's true, particularly because of wildcards like Elon Musk who are eager willing to put the public at risk to test experimental software.
 
Last edited:
  • #65
First, I was serious. And stop calling me Shirley.

Second. the problem with discussing "AI", much less its purpose, is that it is such a huge area, lumping it all together is seldom helpful. Personally I feel that the most interesting work has been done in motion, balance and sensors.

Third, we had this technology almost 40 years ago. That was based on letters, not words, and it was much slower than real-time. And nobody got excited.
 
  • Like
Likes physicsworks, russ_watters and berkeman
  • #66
Vanadium 50 said:
Third, we had this technology almost 40 years ago.
We didn't. Because it was not possible at the time to train models this complex, with this much data. There was neither enough data, nor computational power. No wonder nobody got excited! I played with stuff like GPT-2 some time ago, even that was complete trash compared to ChatGPT.
 
  • #67
@AndreasC , I was doing it 40 years ago.
 
  • Like
Likes PeterDonis
  • #68
Vanadium 50 said:
@AndreasC , I was doing it 40 years ago.
Sure, you could train language models 40 years ago. Just like you could make computers back then. Except they couldn't do nearly as much as modern ones.
 
  • #69
If you want to argue that the difference between then and now is that hardware has gotten cheaper, you should argue that. But the ideas themselves are old. As I said, I was there.
 
  • #70
Vanadium 50 said:
If you want to argue that the difference between then and now is that hardware has gotten cheaper, you should argue that. But the ideas themselves are old. As I said, I was there.
Not just hardware. Also the data available.
 
  • #71
That's just a statement that you can pre-train your program on a large number of questions. I've already said it was much slower than real time. It doesn't make any difference to what the program does. It does, however, make a difference to the illusion of intelligence,.

As discussed, ChatGPT doesn't even try to output what is correct. It tries to output what is written often. There is some home that there is a correlation between that and correctness, but that's not always true and it was not hard to come up with examples.

ChatGPT is the love child of Clever Hans and the Mechanical Turk.
 
  • Like
  • Haha
  • Love
Likes physicsworks, PeterDonis and nsaspook
  • #72
> As discussed, ChatGPT doesn't even try to output what is correct.

Exactly. It also tries to err on the side of providing an answer, even when it has no idea what the right answer is. I used Stable Diffusion to generate pictures of composite animals that don't exist, then asked ChatGPT multiple times to identify them. The AI *never*, not even once said, "I can't identify that" or "I don't know what that is," nor did it suspect that it wasn't a real animal. It's guesses were at least related to the broad class of animal the composites resembled, but that was it.

There is no there, there.
 
  • Like
Likes dextercioby, russ_watters and AndreasC
  • #73
Oscar Benavides said:
It also tries to err on the side of providing an answer
It doesn't even "try"--it will always output text in response to a prompt.

Oscar Benavides said:
even when it has no idea what the right answer is
It never does, since it has no "idea" of any content at all. All it has any "idea" of is relative word frequencies.
 
  • Like
Likes Math100, Motore, Vanadium 50 and 3 others
  • #74
I'm not even sure how you could measure uncertainty in the output based on word frequency. "Some people say Aristotle was Beligian" will throw it off.
 
  • Like
Likes Oscar Benavides
  • #75
I tried using it a couple of times and for me it is really not usefull. For complex code, I found out it's faster to go to Stack Overflow, because there I get more understanding of the code beside the code itself.
The only thing that is really good at is writing language based question (write me a song, interpret this and this text, an email, ...) which some people will find usefull.
For research or factual questions it's to unreliable. It's just faster to use Wiki.
 
  • Like
Likes weirdoguy
  • #76
I know someone who has the paid version and says it's a lot more reliable. Previously, using the free version, a request for scientific references on a topic produced 40 authentic looking but completely unreal references. The paid version produced real references that all checked out.
 
  • #77
bob012345 said:
I know someone who has the paid version and says it's a lot more reliable.
Is there any reference online about this paid version and how it differs from the free version?
 
  • Like
Likes russ_watters
  • #79
bob012345 said:
Thanks! It looks like, at the very least, the paid version includes searching the Internet for actual answers to prompts, so it is not the same thing as the free version that my Insights article (and the Wolfram article it references) discuss.
 
  • Like
Likes Math100, russ_watters and bob012345
  • #80
OpenAI explain thd differences between ChatGPT 3, 3.5 and 4 (and indicate the plans and timeline for 5) on their website.
 
  • #81
AndreasC said:
How do we know at what point it "knows" something? There are non-trivial philosophical questions here... These networks are getting so vast and their training so advanced that I can see someone eventually arguing they have somehow formed a decent representation of what things "are" inside them. [...]

I think that's the point exactly. At some point we'll be unable to tell the difference, and the person who calls you trying to convince you to change your phone company, electricity company or whatever, might be a machine. But if you can't tell the difference than what is the difference?!

---------------------------------------------------------------

Filip Larsen said:
Stochastic parrot. Hah! Very apt.
russ_waters said:
Maybe the intent was always to profit from 3rd parties using it as an interface [...]

pbuk said:
Ya think?[...]

And then we enter the land of sarcasm. :)

---------------------------------------------------------------

This ChatGPT thingy really gets people riled up. I suspect especially the teaching part of the community here. ;P

... still reading....
 
  • #82
What it means "to know" is philosophy.

However, an epistomologist would say that an envelope that contaiend the phrase "It is after 2:30 and before 2:00" does not posess knowledgem eve though it is correct about as often as ChatGPT.
 
  • Like
Likes Bystander
  • #83
I'm not convinced that human intelligence is so effective. This site in many ways is a gross misrepresentation of human thought and interactions. For all the right reasons! Go anywhere else on the Internet or out in the street, as it were, and there is little or no connection between what people think and believe and objective evidence.

Chat GPT, if anything, is more reliable in terms of its objective assessment of the world than the vast majority of human beings.

Chat GPT doesn't have gross political, religious or philosophical prejudices.

If you talked to an Oil Company Executive, then there was no climate change and the biggest threat to humanity was the environmental movement.

Most humans beings deliberately lie if it is in their interests. With Chat GPT at least you know it isn't deliberately lying to you.

I don't know where AI is going, or where we are heading, but I could make a case that Chat GPT is more rational, intelligent and truthful than 99% of the people on this planet.
 
  • Skeptical
  • Like
Likes mattt and Bystander
  • #84
PeroK said:
Chat GPT, if anything, is more reliable in terms of its objective assessment of the world
ChatGPT does not have any "objective assessment of the world". All it has is the relative word frequencies in its training data.

Wolfram Alpha, ironically, would be a much better thing to describe with the phrase you use here. It actually does contain a database (more precisely multiple databases with different entry and lookup criteria) with validated information about the world, which it uses to answer questions.

PeroK said:
Chat GPT doesn't have gross political, religious or philosophical prejudices.
Only for the same reason a rock doesn't.
 
  • Like
  • Skeptical
  • Haha
Likes dextercioby, Bystander, russ_watters and 3 others
  • #85
PeterDonis said:
ChatGPT does not have any "objective assessment of the world". All it has is the relative word frequencies in its training data.

Wolfram Alpha, ironically, would be a much better thing to describe with the phrase you use here. It actually does contain a database (more precisely multiple databases with different entry and lookup criteria) with validated information about the world, which it uses to answer questions.Only for the same reason a rock doesn't.
In a practical sense, you could live according to what answers ChatGPT gives you. Wolfram Alpha is a mathematical engine. It's not able to communicate on practical everyday matters. Nor can a rock.

How any software works is not really the issue if you are an end user. The important thing is what it outputs.

You are too focused, IMO, on how it does things and not what it does.
 
  • Skeptical
Likes Motore
  • #86
PeroK said:
In a practical sense, you could live according to what answers ChatGPT gives you.
For your sake I sincerely hope you don't try this. Unless, of course, you only ask it questions whose answers you don't really care about anyway and aren't going to use to determine any actions. Particularly any actions that involve risk of harm to you or others.

PeroK said:
Wolfram Alpha is a mathematical engine. It's not able to communicate on practical everyday matters.
Sure it is. You can ask it questions in natural language about everyday matters and it gives you answers, if the answers are in its databases. Unlike ChatGPT, it "knows" when it doesn't know an answer and tells you so. ChatGPT doesn't even have the concept of "doesn't know", because it doesn't even have the concept of "know". All it has is the relative word frequencies in its training data, and all it does is produce a "continuation" of the text you give it as input, according to those relative word frequencies.

Granted, Wolfram Alpha doesn't communicate its answers in natural language, but the answers are still understandable. Plus, it also includes in its answers the assumptions it made while parsing your natural language input (which ChatGPT doesn't even do at all--not just that it doesn't include any assumptions in its output, but it doesn't even parse its input). For example, if you ask Wolfram Alpha "what is the distance from New York to Los Angeles", it includes in its answer that it assumed that by "New York" you meant the city, not the state.

PeroK said:
You are too focused, IMO, on how it does things and not what it does.
Huh? The Insights article under discussion, and the Wolfram article it references, are entirely about what ChatGPT does, and what it doesn't do. Wolfram also goes into some detail about the "how", but the "what" is the key part I focused on.
 
  • #87
PeroK said:
You are too focused, IMO, on how it does things and not what it does.
Could you make the same argument for astrology? Yesterday it told me to talk to a loved one and it worked!
 
Last edited:
  • Like
Likes PeterDonis, pbuk, dextercioby and 1 other person
  • #88
PeterDonis said:
For your sake I sincerely hope you don't try this. Unless, of course, you only ask it questions whose answers you don't really care about anyway and aren't going to use to determine any actions.
I don't personally intend to, no. But, there are worse ways to get answers.
 
  • #89
PeroK said:
there are worse ways to get answers.
So what? That doesn't make ChatGPT good enough to rely on.
 
  • Like
Likes Motore
  • #90
PeroK said:
I don't personally intend to, no.
Doesn't that contradict your previous claim here?

PeroK said:
In a practical sense, you could live according to what answers ChatGPT gives you.
If you're not willing to do this yourself, on what basis do you justify saying that someone else could do it?
 
  • #91
Vanadium 50 said:
Could you make tyhe same argument for astrology? Yesterday it told me to talk to a loved one and it worked!
There's no comparison. Chat GPT, however imperfectly, is working on a global pool of human knowledge. There's a rationale that it's trying to produce an unprejudiced, balanced answer.

Perhaps it will fail to develop. But, ten years from now, who knows how many people will be using it or its competitors as their mentor?
 
  • #92
PeterDonis said:
Doesn't that contradict your previous claim here?If you're not willing to do this yourself, on what basis do you justify saying that someone else could do it?
People can do what they want. It's an option, for sure. In fact, we've seen some evidence on here that significant numbers of people are using it to learn about physics.

If some people choose to live by astrological charts, them others can choose to live by ChatGPT. I choose to do neither. For the time being.
 
  • #93
PeroK said:
Chat GPT, however imperfectly, is working on a global pool of human knowledge.
No, it's working on a global pool of text. That's not the same as "knowledge". ChatGPT has no information about the connection of any of the text in its training data with the actual world. It doesn't even make use of the text in itself; it only makes use of the relative word frequencies in the text.

PeroK said:
ten years from now, who knows how many people will be using it or its competitors as their mentor?
Not in its current form. The next obvious step in the evolution of such models--connecting them to actual real world data--is already being taken, at the very least with the paid version of ChatGPT (mentioned in earlier posts), which includes actual lookups in various data sources (web search, for one, and for another, ironically, Wolfram Alpha) for generating responses. In other words, to do the key things that the current free version, which is what this Insights article discussed, does not. Ten years from now, I expect that further steps along those lines will have been taken and will have made these tools reliable in a way that the current ChatGPT is not.
 
  • #94
PeroK said:
People can do what they want.
Sure, but now you're backing away from your previous claim. People are free to choose to do stupid things, of course; but previously you were saying that relying on ChatGPT for practical information was not stupid. Now you're back-pedaling and saying, well, yes, it is stupid, just like relying on astrology, but there will always be people who choose to do stupid things.
 
  • #95
So I tried it, but it was non committal!

Should we climb the Moine Ridge on Thursday this week?

To make an informed decision about climbing the Moine Ridge on Thursday, I recommend checking weather forecasts, consulting with experienced climbers or local mountaineering authorities, and assessing your own skills and experience. Additionally, consider factors such as trail conditions, safety equipment, and the overall fitness and preparedness of your climbing team.
Mountain environments can be unpredictable and potentially dangerous, so it's essential to prioritize safety and make well-informed decisions.
 
  • #96
PeterDonis said:
So what? That doesn't make ChatGPT good enough to rely on.
People already rely on a steady diet of lies and misinformation from human sources. ChatGPT is at least honest. I would trust ChatGPT more than I would the US Supreme Court, for example.
 
  • Skeptical
Likes physicsworks and russ_watters
  • #97
PeroK said:
Chat GPT, however imperfectly, is working on a global pool of human knowledge.
Not even that, it just predicts words. It doesn't care if the sentence it makes actually describes anything real. It cannot.
An example:
Q: How long does a ferry ride from Istanbul to Trieste take?
ChatGPT:
A direct ferry ride from Istanbul to Trieste is not available, as these two cities are located in different countries and are quite far apart. Istanbul is in Turkey, while Trieste is in northeastern Italy.

To travel between Istanbul and Trieste, you would need to consider alternative transportation options such as flights, trains, or buses...


Of course, there is a route from Istanbul to Trieste (at least that's what google tells me).

Sure more data, more parameters will make it better, but it's still not reliable.
 
  • Like
Likes dextercioby
  • #98
PeroK said:
ChatGPT is at least honest.
No, it's not. "Honest" requires intent. ChatGPT has no intent.

PeroK said:
I would trust ChatGPT more than I would the US Supreme Court, for example.
I don't see how you would even compare the two. The US Supreme Court issues rulings that say what the law is. You don't "trust" or "not trust" the US Supreme Court. You either abide by its rulings or you get thrown in jail.
 
  • Like
Likes Motore
  • #99
PeroK said:
There's no comparison.
Didn't I just make one? :smile:
PeroK said:
Chat GPT, however imperfectly, is working on a global pool of human knowledge.
Actually, it is working on pool of human writing,

The idea is that writing is a good enough proxy for knowledge and that word frequency distributions* are a good enough proxy for understanding. The thread as well as some past ones highlight many cases where this does not work.

FWIW, I think ChatGPT could write horoscopes as well as the "professionals". But probably not write prescriptions.

* But not letter frequency distributions, which we had 40 years ago doing much the same thing. That would just be crazy talk.
 
  • Like
Likes berkeman, Motore, BWV and 1 other person
  • #100
Motore said:
Not even that, it just predicts words. It doesn't care if the sentence it makes actually describes anything real. It cannot.
An example:
Q: How long does a ferry ride from Istanbul to Trieste take?
ChatGPT:
A direct ferry ride from Istanbul to Trieste is not available, as these two cities are located in different countries and are quite far apart. Istanbul is in Turkey, while Trieste is in northeastern Italy.

To travel between Istanbul and Trieste, you would need to consider alternative transportation options such as flights, trains, or buses...


Of course, there is a route from Istanbul to Trieste (at least that's what google tells me).

Sure more data, more parameters will make it better, but it's still not reliable.
You may be right and it'll die a death. I'm not so sure. The reasons for the adoption of technology are often social and cultural, rather than technical.

In fact, there is evidence it's already taken off.
 

Similar threads

Back
Top