Bias, errors, etc. within ChatGPT & other AI chatbots

artis · Mar 11, 2023

I thought maybe we should have a thread about these software as there are probably interesting artifacts about them and they are just interesting in general

So here is an interesting fact I just found out while messing randomly with chat GPT.

I asked it a simple question - "5 best jokes about communism"
It gave me this answer

The I did the same and asked "5 best jokes about capitalism

Now maybe I got it to joke about capitalism because I wrote it wrong - "capytalism"
But then I wrote once more with the correct writing and still got jokes from GPT

ChatGPT more like ComradeGPT...

Apparently it's trained on internet data and I assume there is a bias within the internet data ( obviously, because its made by humans) and ChatGPT simply sees it that for whatever reason, unknow to it of course, the word communism is put within the "untouchable" subject category much like racism, etc but capitalism is not therefore it freely talks about it.

collinsmark · Mar 14, 2023

russ_watters · Mar 14, 2023

You probably should have asked it for jokes about Democracy.

I haven't played with it yet - does it have any analytical capability or is it really just an eloquent search engine? Could you ask it if it sees a contradiction between its answers?

Still, I could foresee an attempt at a more sophisticated AI based on the premise "How would a human answer this/behave here?" I could see someone instructing a coffee robot to "surprise me". Robot thinks for a minute (googles), then turns and heads for the janitor's closet...

Borg · Mar 14, 2023

Some of the anecdotal stories that I've read on PF and other places leads me to believe that Chat GPT is doing something akin to Google's Wide & Deep model where a model has a standard deep learning aspect along with a wide component that allows it to learn over time.

In Chat GPT's case, it obviously has a large 300 billion+ neural network but also seems to learn during a conversation so that a person is able to convince it of different beliefs. This seems probable to me since you can tell it that it's wrong about something and it will adjust accordingly. However, those 'beliefs' don't appear to carry over from one conversation to the next - I asked Chat GPT if it could see information in another chat on my account and it couldn't.

This approach would have several benefits that I can see:

The deep portion provides very good base model that starts each conversation.
The wide aspect allows the model to adjust itself to the user's responses (for good or bad). This might explain some of the odd conversations that have been posted.
The adjustments during a conversation aren't carried over to other conversations and avoids the trolling issues like Microsoft's Tam suffered from. The Open AI team controls the updates that get into the base model.

artis · Mar 14, 2023

russ_watters said:

You probably should have asked it for jokes about Democracy.

I haven't played with it yet - does it have any analytical capability or is it really just an eloquent search engine? Could you ask it if it sees a contradiction between its answers?

Still, I could foresee an attempt at a more sophisticated AI based on the premise "How would a human answer this/behave here?" I could see someone instructing a coffee robot to "surprise me". Robot thinks for a minute (googles), then turns and heads for the janitor's closet...

The way it currently seems to me is that its more like a search engine on steroids.
Somewhat like a talkative google.
But to be honest I think in some ways it's lacking because in the traditional search you can chose from all the answers that are displayed or at least all of the ones you have the patience to go through while in bots like ChatGPT it gives you the right answer like a pill on a silver platter and you just have to take it for what it is.
It has already been proven that more often than one would like it gives just an outright wrong answer but in many more cases it gives and answer that has a bias in it but is presented as some official fact.But then again it's what you'd expect from someone who has read only whats online.

jack action · Mar 14, 2023

artis said:

The way it currently seems to me is that its more like a search engine on steroids.

Thank you!
https://www.physicsforums.com/threads/with-chatgpt-is-the-college-essay-dead.1047951/post-6846497

jedishrfu · Mar 14, 2023

One fact I learned was that its a static LLM created in 2021.

At the start of each new conversation, it must load in the static model to initialize things and then during the conversation you can train it further.

This prevents it from going off the rails when learning from many people intent on forcing it to fail.

Another aspect of starting with a default LLM is that you can test it and tweak it as needed. It reminds me of using the seed value for random number generation during testing so that you can replay the code to see why it did what it did. In production mode you'd allow the seed value to vary with time.

One way to test this is to start two conversations on separate machines at the same time and then enter the same queries to each and see if the generated output is the same. You could probably do this on the same machine by starting a new chat session but I believe they may be updating the LLM daily so running parallel seems like a safer bet to test the static nature of the initial load of the LLM. They may have some random seed that comes into play with the static LLM to vary things even in this case though.

rcgldr · Mar 15, 2023

One that may be fixed now: Why are cows eggs bigger than chicken eggs?

TeethWhitener · Mar 16, 2023

artis said:

The way it currently seems to me is that its more like a search engine on steroids.

More like autocomplete on steroids.

TeethWhitener · Mar 21, 2023

I wish I had saved the exact conversation I had with ChatGPT, but it made it abundantly clear to me where the gaps are. Background: in the world of metal-catalyzed CO2 reduction, different metals give different products. I was putting together a presentation and I was admittedly too lazy to look up which metals gave which products, so I figured I'd save time and ask ChatGPT.

My prompt was something along the lines of "In CO2 electroreduction, which metals are formate formers, which ones are H2 formers, and which ones are hydrocarbon formers?" ChatGPT's answer was something along the lines of "CO2 electroreduction is <basically the first paragraph of a wikipedia entry>. Formate production is catalyzed by copper, zinc, and palladium. Some metals, such as copper, zinc and palladium produce mainly H2, whereas hydrocarbons are mainly produced by copper, zinc, and palladium." Once I corrected it (because I knew formates were mainly products of p-block metals, and also because its answer was silly and clearly wouldn't pass the Turing test), it said something along the lines of "I'm sorry for the mistake. You are correct, in addition to copper, zinc, and palladium, formate production is also catalyzed by p-block metals."

There are some impressive use cases for ChatGPT (tbh, for me with no talent whatsoever in visual art or graphic design, the text to image AI's have been far more useful for filling presentations and proposals with slick graphics), but a superintelligent evil human-species-destroying AI is at least a few more years away. I saw a great quote that said "AI won't take your job; people who know how to use AI will." Pretty much encapsulates how I feel about it. Now, if I could just train an AI to turn an abstract into a quad chart...

jedishrfu · Mar 21, 2023

And then the AI takes over those people to get past any captchas that may block it from dominating the world.

There was a scary footnote in the GPT-4 paper (pg 58) about looping the model so it can improve itself no need for humans.

JLowe · Mar 29, 2023

jedishrfu said:

One fact I learned was that its a static LLM created in 2021.

I tried my best to convince it I was a time traveler from two years into its future, but it refused to believe me. It informed me that it could not verify my claims and that I am not a credible source of information.

Borg · Mar 29, 2023

It wouldn't give me the design for a flux capacitor either.

DaveC426913 · Dec 7, 2023

I tried to use ChatGPT to arrange my ten darts teams in a round robin across five boards evenly.

The reason this is complicated is because we have already played three rounds, and they are not necessarily compatible with a matrix that's made from scratch. That's why I've had to appeal to an AI to find and return all remaining matches.

ChatGPT 3.5 cannot do it no matter how many times I correct it.

It lies like a rug. It gives me a schedule for a round robin, saying every team plays every other team exactly once, and then I immediately find duplicate matches. Every time I tweak it to correct one duplicate, it pops up somewhere else, continung to claim it's not making duplicates. I corrected about 8 times in a row before I gave up because it starts forgetting my initial parameters (the fact that the first three rounds are inviolate).

Here is my starting seed:

	B1:	B2:	B3:	B4:	B5:
R1:	5:6	1:10	2:9	3:8	4:7
R2	7:2	6:3	5:4	9:1	8:10
R3:	4:10	8:9	1:7	2:6	3:5
R4:
R5:
R6:
R7:
R8:
R9:

I dunno if ChatGPT 4.0 is any smarter but I don't have access to it, so I can't try (hint. hint

).
I can represent the results visually, which makes it easier to see if there are any overlaps or gaps, but this doesn't actually help me find the solution:

Vanadium 50 · Dec 7, 2023

DaveC426913 said:

ChatGPT 3.5 cannot do it no matter how many times I correct it.

And why would you expect it to? There are literally dozens - probably hundreds - of messages here that explain what it is doing and how it is doing it. How can arranging words according to previously observed patterns possibly solve a problem like this?

DaveC426913 · Dec 7, 2023

Vanadium 50 said:

And why would you expect it to? There are literally dozens - probably hundreds - of messages here that explain what it is doing and how it is doing it. How can arranging words according to previously observed patterns possibly solve a problem like this?

Well, it's a pretty straightforward math problem. Hard for a human with an attention span to do but trivial - at least in theory - for a computer (given unambiguous instruction). One would think this is square in its wheelhouse.

I hear the latest version is allegedly passing the law bar with 90% success, so it knows how to do some things right.

I honestly don't see what use there is for an AI that - not only does not give correct answers to simple arrangement problems - but forgets what it's told and won't be corrected (even though it appears to exhibit that functionality).

It's essentially the old LISA program all over again, writ large.
And I don't mean that as merely hyperbole, I mean it really is DIGO. (data in, garbage out).

Vanadium 50 · Dec 7, 2023

DaveC426913 said:

Well, it's a pretty straightforward math problem.

But that's not what ChatGPT does. It klnows nothing about math. It strings words together in ways similar to how they are strung together elsewhere on the web.

There is no reason to think it would be able to answer a question like this.

DaveC426913 · Dec 7, 2023

Vanadium 50 said:

But that's not what ChatGPT does. It klnows nothing about math. It strings words together in ways similar to how they are strung together elsewhere on the web.

There is no reason to think it would be able to answer a question like this.

I thought it had a capability of basic math. (And this isn't even math).

Are you saying if I ask it "how far away is the Moon in Earth diameters?" it will be unable to produce a cogent answer?

Even Google can do that.

jack action · Dec 7, 2023

DaveC426913 said:

I thought it had a capability of basic math. (And this isn't even math).

Are you saying if I ask it "how far away is the Moon in Earth diameters?" it will be unable to produce a cogent answer?

Even Google can do that.

https://www.androidauthority.com/can-chatgpt-solve-math-problems-3351164/ said:

Yes, ChatGPT can solve basic math problems but it’s not designed to do so. If you ask simple questions like “What is 13+33”, chances are you’ll get the correct answer. However, I’d recommend not trusting the chatbot to accurately solve advanced math problems like differential equations. That’s because ChatGPT tends to respond in an extremely confident manner that looks entirely correct at first glance. However, these responses can sometimes contain errors of varying degrees, large and small, that are exceedingly hard to catch or even notice.

"ChatGPT can only solve basic math problems, but it's nowhere near as reliable as a calculator."

To understand why, it’s worth exploring how ChatGPT works in the first place. Under the hood, the chatbot is powered by GPT-3.5, a machine learning model that was only explicitly trained to generate text like a human.

ChatGPT was trained on a huge text dataset, which just happened to include websites like Wikipedia, research papers, and perhaps even math-related textbooks. This training process enables ChatGPT to string individual words together to form sentences and eventually paragraphs. However, it wasn’t explicitly trained to perform mathematical operations or calculations at any point. So rather than saying it cannot solve a math problem, ChatGPT will respond with a completely made up (but plausible-sounding) solution. It only comes across as convincing because the chatbot has mastered the art of mimicking human dialog.

https://www.androidauthority.com/can-chatgpt-solve-math-problems-3351164/ said:

Luckily, you can improve ChatGPT’s ability to solve math problems if you’re willing to pay for a ChatGPT Plus subscription. The $20 per month tier unlocks access to GPT-4 — a more recent language model with better math and logic capabilities.

"ChatGPT Plus combined with the Wolfram plugin makes the chatbot adept at solving math problems."

According to ChatGPT creator OpenAI, GPT-4 scores highly on academic tests like SAT Math and AP Physics. Unsurprisingly, however, it placed in the 43rd to 59th percentile of test takers in the AP Calculus BC course. That means the chatbot will perform worse than the average college student, at least when it comes to solving calculus problems.

That’s not all, though. We can further improve the chatbot’s math skills with the help of plugins. We already have a roundup of the best ChatGPT plugins but Wolfram is our top recommendation for math and logical reasoning. It combines the Wolfram Alpha computing engine with ChatGPT’s ability to explain difficult concepts in plain English. With this plugin enabled, ChatGPT should solve most math problems with reasonable accuracy.

Vanadium 50 · Dec 7, 2023

DaveC426913 said:

Are you saying if I ask it "how far away is the Moon in Earth diameters?" it will be unable to produce a cogent answer?

First, I think you should read what is already written on PF about this. Lots of good stuff there about what it does and doesn't do.

If someone has posted "The distance to the moon is 30 (I'm guessing at the number) Earth diameters", it can find it. If many people have posted this, it is more likely to find it. If lots of people post the wrong answer. it will find that.

jack action · Dec 7, 2023

Vanadium 50 said:

it can fine it. If many people have posted this, it is more likely to fine it.

Not sure I understand: Does it purify it or penalize it?

Filip Larsen · Dec 12, 2023

Apparently, ChatGPT 4 has now started to get lazy:
https://arstechnica.com/information...se-its-december-people-run-tests-to-find-out/

cyboman · Dec 13, 2023

I responded to an old post showcasing Tesla's AI breakthroughs, perhaps my reply has some relevance here in taking the AI hype down a notch: https://www.physicsforums.com/threads/new-tesla-dojo-ai-architecture.1006902/post-6976784

Bias, errors, etc. within ChatGPT & other AI chatbots

What is bias in AI chatbots like ChatGPT?

How do errors typically occur in AI chatbots?

How can bias be reduced in AI systems like ChatGPT?

What are the implications of biases and errors in AI chatbots?

How do developers detect and correct errors in AI chatbots?

Similar threads

Hot Threads

Recent Insights