Insights Why ChatGPT AI Is Not Reliable

  • Thread starter Thread starter PeterDonis
  • Start date Start date
  • Tags Tags
    chatgpt
Click For Summary
ChatGPT is deemed unreliable because it generates text based solely on word frequencies from its training data, lacking true understanding or semantic connections. Critics argue that it does not accurately answer questions or provide reliable information, often producing confident but incorrect responses. While some users report that it can parse complex code and suggest optimizations, this does not equate to genuine knowledge or reasoning. The discussion highlights concerns about its potential impact on how society perceives knowledge and the importance of critical evaluation of AI-generated content. Ultimately, while ChatGPT may appear impressive, its limitations necessitate cautious use and independent verification of information.
  • #91
Vanadium 50 said:
Could you make tyhe same argument for astrology? Yesterday it told me to talk to a loved one and it worked!
There's no comparison. Chat GPT, however imperfectly, is working on a global pool of human knowledge. There's a rationale that it's trying to produce an unprejudiced, balanced answer.

Perhaps it will fail to develop. But, ten years from now, who knows how many people will be using it or its competitors as their mentor?
 
Computer science news on Phys.org
  • #92
PeterDonis said:
Doesn't that contradict your previous claim here?If you're not willing to do this yourself, on what basis do you justify saying that someone else could do it?
People can do what they want. It's an option, for sure. In fact, we've seen some evidence on here that significant numbers of people are using it to learn about physics.

If some people choose to live by astrological charts, them others can choose to live by ChatGPT. I choose to do neither. For the time being.
 
  • #93
PeroK said:
Chat GPT, however imperfectly, is working on a global pool of human knowledge.
No, it's working on a global pool of text. That's not the same as "knowledge". ChatGPT has no information about the connection of any of the text in its training data with the actual world. It doesn't even make use of the text in itself; it only makes use of the relative word frequencies in the text.

PeroK said:
ten years from now, who knows how many people will be using it or its competitors as their mentor?
Not in its current form. The next obvious step in the evolution of such models--connecting them to actual real world data--is already being taken, at the very least with the paid version of ChatGPT (mentioned in earlier posts), which includes actual lookups in various data sources (web search, for one, and for another, ironically, Wolfram Alpha) for generating responses. In other words, to do the key things that the current free version, which is what this Insights article discussed, does not. Ten years from now, I expect that further steps along those lines will have been taken and will have made these tools reliable in a way that the current ChatGPT is not.
 
  • #94
PeroK said:
People can do what they want.
Sure, but now you're backing away from your previous claim. People are free to choose to do stupid things, of course; but previously you were saying that relying on ChatGPT for practical information was not stupid. Now you're back-pedaling and saying, well, yes, it is stupid, just like relying on astrology, but there will always be people who choose to do stupid things.
 
  • #95
So I tried it, but it was non committal!

Should we climb the Moine Ridge on Thursday this week?

To make an informed decision about climbing the Moine Ridge on Thursday, I recommend checking weather forecasts, consulting with experienced climbers or local mountaineering authorities, and assessing your own skills and experience. Additionally, consider factors such as trail conditions, safety equipment, and the overall fitness and preparedness of your climbing team.
Mountain environments can be unpredictable and potentially dangerous, so it's essential to prioritize safety and make well-informed decisions.
 
  • #96
PeterDonis said:
So what? That doesn't make ChatGPT good enough to rely on.
People already rely on a steady diet of lies and misinformation from human sources. ChatGPT is at least honest. I would trust ChatGPT more than I would the US Supreme Court, for example.
 
  • Skeptical
Likes physicsworks and russ_watters
  • #97
PeroK said:
Chat GPT, however imperfectly, is working on a global pool of human knowledge.
Not even that, it just predicts words. It doesn't care if the sentence it makes actually describes anything real. It cannot.
An example:
Q: How long does a ferry ride from Istanbul to Trieste take?
ChatGPT:
A direct ferry ride from Istanbul to Trieste is not available, as these two cities are located in different countries and are quite far apart. Istanbul is in Turkey, while Trieste is in northeastern Italy.

To travel between Istanbul and Trieste, you would need to consider alternative transportation options such as flights, trains, or buses...


Of course, there is a route from Istanbul to Trieste (at least that's what google tells me).

Sure more data, more parameters will make it better, but it's still not reliable.
 
  • Like
Likes dextercioby
  • #98
PeroK said:
ChatGPT is at least honest.
No, it's not. "Honest" requires intent. ChatGPT has no intent.

PeroK said:
I would trust ChatGPT more than I would the US Supreme Court, for example.
I don't see how you would even compare the two. The US Supreme Court issues rulings that say what the law is. You don't "trust" or "not trust" the US Supreme Court. You either abide by its rulings or you get thrown in jail.
 
  • Like
Likes Motore
  • #99
PeroK said:
There's no comparison.
Didn't I just make one? :smile:
PeroK said:
Chat GPT, however imperfectly, is working on a global pool of human knowledge.
Actually, it is working on pool of human writing,

The idea is that writing is a good enough proxy for knowledge and that word frequency distributions* are a good enough proxy for understanding. The thread as well as some past ones highlight many cases where this does not work.

FWIW, I think ChatGPT could write horoscopes as well as the "professionals". But probably not write prescriptions.

* But not letter frequency distributions, which we had 40 years ago doing much the same thing. That would just be crazy talk.
 
  • Like
Likes berkeman, Motore, BWV and 1 other person
  • #100
Motore said:
Not even that, it just predicts words. It doesn't care if the sentence it makes actually describes anything real. It cannot.
An example:
Q: How long does a ferry ride from Istanbul to Trieste take?
ChatGPT:
A direct ferry ride from Istanbul to Trieste is not available, as these two cities are located in different countries and are quite far apart. Istanbul is in Turkey, while Trieste is in northeastern Italy.

To travel between Istanbul and Trieste, you would need to consider alternative transportation options such as flights, trains, or buses...


Of course, there is a route from Istanbul to Trieste (at least that's what google tells me).

Sure more data, more parameters will make it better, but it's still not reliable.
You may be right and it'll die a death. I'm not so sure. The reasons for the adoption of technology are often social and cultural, rather than technical.

In fact, there is evidence it's already taken off.
 
  • #101
Motore said:
ferry ride from Istanbul to Trieste take
Perhaps you should have asked about the ferry from Constantinople.

 
  • #102
PeroK said:
You may be right and it'll die a death. I'm not so sure.
No, of course it won't die. It is extremely usefull for a lot of tasks.
I am just saying that in the way that it's structured right now it cannot be reliable.

Here is an interesing use for a similar language model AI: https://www.biorxiv.org/content/10.1101/2022.09.29.509744v1
 
  • #103
PeroK said:
Chat GPT doesn't have gross political, religious or philosophical prejudices.
This isn't exactly true (though depends on what you mean by "gross"). It has guardrails designed to constrain content, which reflect the biases of the programmers. For example, a few months ago someone asked it for religious jokes and while it was OK with Christian jokes it declined to provide Islamic jokes. I think this bias has since been corrected.

It is also biased by its programmers' choice of source information. For example, the user base of Reddit has a lot more say in the generated output than the membership of AARP.
 
  • Like
Likes dextercioby, Vanadium 50, Motore and 1 other person
  • #104
russ_watters said:
This isn't exactly true (though depends on what you mean by "gross").
In contrast with social media software, for example, whose model is to focus information based on your perceived prejudices.

With ChatGPT you are not in an echo chamber being fed a steady diet of misinformation.

For example, I stumbled on a twitter feed about the COVID vaccine. Everyone on the thread believed that it was harmful. One woman was puzzled by those who willingly took the vaccine and they all agreed it must be down to "low intelligence".

That is "gross"; and your examples of ChatGPT bias pale by comparison.
 
  • Like
Likes russ_watters
  • #105
PeroK said:
With ChatGPT you are not in an echo chamber being fed a steady diet of misinformation.
True, because "misinformation" requires intent just as much as "honesty" does, and ChatGPT has no intent. Or, to put it another way, ChatGPT is not reliably unreliable any more than it is reliably reliable. :wink:

PeroK said:
I stumbled on a twitter feed about the COVID vaccine. Everyone on the thread believed that it was harmful. One woman was puzzled by those who willingly took the vaccine and they all agreed it must be down to "low intelligence".

That is "gross"
While I understand the point being made in the context of this thread, let's please not take it any further, in order to ensure that we don't derail the thread into a political discussion.
 
  • Like
Likes russ_watters and phinds
  • #106
The interesting angle is how much of the general LLM model's inaccuracy comes from the training set vs. the algorithm. Training these models on the broad Internet of course exposes it to a large amount of bad information. What will likely come next are smaller models trained on a curated data set for a specific application, say medicine or law.

The LLMs appear to make Python calls for calculations rather than look through their training set, for example with Bard (and the answer is correct)

what is the cube root of 123456789?​


The cube root of 123456789 is approximately 497.933859. This can be calculated using the following Python code:
Python
import math

def cube_root(number):
"""Calculates the cube root of a number."""
return math.pow(number, 1/3)

print(cube_root(123456789))

Use code with caution. Learn more
content_copy
The output of the code is 497.933859, which is the approximate cube root of 123456789.
Here are some other ways to calculate the cube root of 123456789:
  • You can use a calculator that has a cube root function.
  • You can use a search engine to find a website that calculates cube roots.
  • You can write a custom program in a programming language of your choice to calculate the cube root.
I hope this helps! Let me know if you have any other questions.
,
 
  • #107
PeroK said:
you are not in an echo chamber
Isn't it just the opposite? ChatGPT attempts to create an "echo" indistinguishable from its "training echos".
 
  • Like
Likes russ_watters
  • #108
PeroK said:
ChatGPT is at least honest.
This is a category error, but one that is almost impossible to avoid. The English language has no natural way of talking about chatbot output so we inevitably find ourselves saying things like "it thinks" or "it knows" when of course it does no such thing - it's just arranging words to form sequences that resemble patterns already out there on the internet (and as evidence of just how hard it is to avoid this category error I just finished backspacing away the words "it has seen on the internet"). Saying that ChatGPT is honest makes no more sense than saying that the motor hauling my truck up a steep hill is "gutsy and determined to succeed" - the difference is that we know how to talk about mechanical devices in terms of their performance characteristics without attributing sentience to them.
I would trust ChatGPT more than I would the US Supreme Court, for example.
Following is free advice, my feelings won't be hurt if you choose to ignore it... Form your opinions of the Supreme Court by reading the actual opinions (posted at https://www.supremecourt.gov/) and by joining the live blog on opinion days at scotusblog.com. We spend a lot of time complaining about pop-sci videos.... Popular press coverage of major court cases is far worse.
 
Last edited:
  • Like
  • Skeptical
  • Informative
Likes berkeman, Motore, PeroK and 3 others
  • #109
Nugatory said:
Following is free advice, my feelings won't be hurt if you choose to ignore it... Form your opinions of the Supreme Court by reading the actual opinions (posted at https://www.supremecourt.gov/) and by joining the live blog on opinion days at scotusblog.com. We spend a lot of time complaining about pop-sci videos.... Popular press coverage of major court cases is far worse.
I hate to have to repeat myself, but all thread participants, please note my advice in post #105.
 
  • Like
Likes russ_watters and Nugatory
  • #110
PeterDonis said:
please note my advice in post #105.
Are you speaking as a participant, the author of the Insight, or as a moderator?
 
  • Skeptical
Likes russ_watters
  • #111
The issue with "neutrality" is fraught with peril. For example, in the war between Mordor and Gondor, if we only use sources from one side or the other, we might get different opinions. Most people here would say that the side of Gondor is completely truthful and the side of Mordor is nothing but propaganda and lies. But of course the Mordorian side would say differently. Who decided what sources are reliable for training and what ones are not? Ot do we toss them all in and let ChatGPT sort it out? Because then the point of view will be set by whomever can write the most.

But this misses a larger fallacy. The question of whether ChatGPT is reliable or not does not depend on whether people are reliable or not, nor on which people are more reliable than others.
 
  • #112
Vanadium 50 said:
Are you speaking as a participant, the author of the Insight, or as a moderator?
The last.
 
  • Like
Likes berkeman and Nugatory
  • #113
Vanadium 50 said:
Who decided what sources are reliable for training and what ones are not?
Given the way ChatGPT works, it doesn't matter.

Text generated to look like other text on the internet is going to unreliable whether it is patterned on reliable text or not. For example, essentially everything you'll find about duplicate bridge on the internet will be reliable. Some stuff will be written for beginning players, some will be written by and for the relatively small community of world-class players, most will fall somewhere in between, but it's all reasonable advice. But we still get
http://bridgewinners.com/article/view/sorry-i-know-it-is-stupid-to-post-conversations-with-chatgpt/
http://bridgewinners.com/article/view/testing-chatgpt-the-media-hyped-ai-robot/
http://bridgewinners.com/article/view/using-chatgpt/
 
  • Like
Likes Motore and russ_watters
  • #114
PeroK said:
ChatGPT is at least honest.
Is it being honest when it commits academic fraud by fabricating sources?

[Spoiler: it's being nothing.]
 
  • Like
Likes Vanadium 50 and Nugatory
  • #115
Vanadium 50 said:
ChatGPT attempts to create an "echo" indistinguishable from its "training echos".
There is just no way of getting away from speaking in terms of intention and volition, the English language won't let us. We cannot resist the temptation to say that it is "attempting" or "trying" to do something, but it is no more attempting to create an echo or anything else than my dishwasher is motivated to clean the dinner dishes.

The entire ChatGPT phenomenon makes me think of Searle's Chinese Room though experiment: https://en.wikipedia.org/wiki/Chinese_room
 
  • Like
Likes Lord Jestocost
  • #116
Nugatory said:
Given the way ChatGPT works, it doesn't matter.
I was probably unclear. The training data very much matters. And who decides what training data to use or not to use?

The only way it is "unbiased" is that it is difficult to generate output that is differently biased than the training dataset.
 
  • Like
Likes russ_watters
  • #117
Nugatory said:
just no way of getting away from speaking in terms of intention
Probably. Maybe it's better to talk about "design". ChatGPT is not designed to be independent of the echo chamber. It is designed to produce echos indistinguishable from the rest of the echo chamber.
 
  • Like
Likes nsaspook, phinds and russ_watters
  • #118

Applications[edit]​

Around 2013, MIT researchers developed BullySpace, an extension of the commonsense knowledgebase ConceptNet, to catch taunting social media comments. BullySpace included over 200 semantic assertions based around stereotypes, to help the system infer that comments like "Put on a wig and lipstick and be who you really are" are more likely to be an insult if directed at a boy than a girl.[11][12][13]

ConceptNet has also been used by chatbots[14] and by computers that compose original fiction.[15] At Lawrence Livermore National Laboratory, common sense knowledge was used in an intelligent software agent to detect violations of a comprehensive nuclear test ban treaty.[16]

---- Wiki on Commonsense knowledge (artificial_intelligence)

-------------------------------------------------

Just a funny observation.

I tried to bold the part about "detecting violations of the comprehensive nuclear test ban treaty". It's quite a contrast of applications.
 
  • #119
AndreasC said:
The semantic connections you are talking about are connections between sensory inputs and pre-existing structure inside our brains. You're just reducing what it's doing to the bare basics of its mechanics, but its impressive behavior comes about because of how massively complex the structure is.

I don't know if you've tried it out, but it doesn't just "get lucky". Imagine a student passing one test after another, would you take someone telling you they only "got lucky" seriously, and if yes, how many tests would it take? Plus, it can successfully apply itself to problems it never directly encountered before. Yes, not reliably, but enough that it's beyond "getting lucky".

You talk about it like you haven't actually tried it out. It's not at all the same as previous chatbots, it has really impressive capabilities. It can give you correct answers to unambiguous questions that are non-trivial and that it has not specifically encountered before in its training. And it can do that a lot, repeatably. Nothing to do with how confident it sounds, I am talking about unambiguously correct answers.

Again, I'm not saying it is reliable, but you are seriously downplaying its capabilities if you think that's all it does and I encourage you to try it out for yourself. Especially when it comes to programming, it is incredible. You can put in it complicated code that is undocumented, and it can explain to you what the code does exactly, what problem it probably was intended for, and how to improve it, and it works a lot of the time, much more frequently than "luck".

If all you want to say is that it isn't right all the time, then yeah, that's true. It's very, very frequently wrong. But that has little to do with what you are describing. It could (and will) improve significantly on accuracy, using the same mechanism. And practically, what you are saying doesn't matter. A database doesn't "know" what something is either in your sense of the word, neither does a web crawler, or anything like that. That doesn't make them unreliable. Neither is a human reliable because they "know" something (again going by your definition).

ChatGPT is unreliable because we observe it to be unreliable. That requires no explanation. What does require explanation is why, as @Demystifier said, it is so much more reliable (especially at non trivial, "reasoning" type problems) than you would naively expect.
This reminds me of statistical machine translation (SMT), where the translation quality is (as I have observed) is heavily dependent and correlated on its training model and corpus. And I bet that there is no known SMT models that produces good translation on casual languages unless they are given also casual language corpus.

Maybe related is my observation of Indonesian text translations. Many of there looks stiff and unnatural (think of formal equivalence), which influences me when maintaining Indonesian translation of a popular open source project. In most cases, I spot mood mismatches (e.g. do foo in imperative mood is translated as melakukan foo (descriptive mood due to me- prefix) whereas the correct translation should have been lakukan foo).
 
  • #120
Although ChatGPT is merely generating its output according to word statistics boiled down from training data, it is subjectively hard to imagine that it can do things like this without some sense of meaning :-

Swamp Thing said:
Input: In the next input I will give you a list of words. In the input after that I will give you a sentence containing the placeholder "xyz". In response to that, please give me the sentence with the most appropriate word from the list.

Output: OK, I'm ready! Please provide the list of words and the sentence with the placeholder.

I: Mouse, whale, horse.

O: Okay. Please provide the sentence with the placeholder "xyz".

I: The xyz is the biggest animal from the list.

O: The whale is the biggest animal from the list.
Can one break down how a blind statistical process can do that?

One could take it a notch higher by adding something like this: "If the placeholder is xyz, process the sentence as it is. If the placeholder is pqr, replace all adjectives and adverbs by their opposites. For example, replace 'hottest' with 'coldest'".

.
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
3K
Replies
10
Views
4K
  • · Replies 39 ·
2
Replies
39
Views
9K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 25 ·
Replies
25
Views
5K
Replies
3
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
3
Views
5K
  • · Replies 1 ·
Replies
1
Views
26K