ChatGPT Examples, Good and Bad

  • Thread starter Thread starter anorlunda
  • Start date Start date
  • Tags Tags
    chatgpt
Click For Summary
Experiments with ChatGPT reveal a mix of accurate and inaccurate responses, particularly in numerical calculations and logical reasoning. While it can sometimes provide correct answers, such as basic arithmetic, it often struggles with complex problems, suggesting a reliance on word prediction rather than true understanding. Users noted that ChatGPT performs better in textual fields like law compared to science and engineering, where precise calculations are essential. Additionally, it has shown potential in debugging code but can still produce incorrect suggestions. Overall, the discussion highlights the need for ChatGPT to incorporate more logical and mathematical reasoning capabilities in future updates.
  • #151
On my second foray ChatGPT flunked utterly. When repeatedly asked to draw a square it instead drew cubes. Fail.

DeepSeek can't draw anything so it wasn't any better.
 
Computer science news on Phys.org
  • #152
Hornbein said:
And it's free.
Then you're the product.
 
  • #153
jack action said:
Then you're the product.
I've gotten used to it.
 
  • #154
Hornbein said:
DeepSeek can't draw anything so it wasn't any better.

The page that I used could create images, as well as analyzing them. But the results of that were not as good as the local renders I can do with DrawThings on my computer.
 
  • #155
Hornbein said:
I've gotten used to it.
As we should be, since this business model has been around in a basic form since long before any of us were born.
 
  • #156
russ_watters said:
As we should be, since this business model has been around in a basic form since long before any of us were born.

Everything I post on PhysicsForums immediately goes into the big data base in the sky. I think that's a good thing. I also think targeted advertising is an improvement.

You have no privacy. Get over it.
-- Scott McNealy. 1999

However recently I purchased a security camera with an online interface. I had to log in to access this. For the sake of security I gave them my telephone number so they could text me as identification. To get the extended service I needed there was a monthly fee. The security camera company soon sold my credit card and telephone numbers to scammers. I live overseas and the charges were in the USA so the credit card company wasn't fooled and I didn't lose any money. Except for the $100 it costs to mail a new card to Indonesia without it getting stolen because it might be a gift card.
 
  • Like
Likes collinsmark and russ_watters
  • #157
My daughter (third year undergraduate student for cs and physics), is giving a speech (for her public speaking class) on why ChatGPT and it's cousins should not be used in the early stages of technical education. We been talking and it goes something like this.

It's not a learning tool, it's a give me answers tool that can short circuit the learning phases that develop the foundational skills (analysis components) needed for a reasoned imagination to see beyond the currently known.

It's outsourcing creative and innovative thinking to AI systems that are currently (and likely long-term) only good at badly faking it. You need the necessary skills first to be able to take something you get from AI, to tell right from wrong from totally perfect nonsense and make it into something useful.
 
  • Like
Likes OmCheeto, gleem, russ_watters and 2 others
  • #158
nsaspook said:
My daughter (third year undergraduate student for cs and physics), is giving a speech (for her public speaking class) on why ChatGPT and it's cousins should not be used in the early stages of technical education. We been talking and it goes something like this.
Smart progeny, like her dad!


Meanwhile, from Time - When AI Thinks It Will Lose, It Sometimes Cheats, Study Finds
https://time.com/7259395/ai-chess-cheating-palisade-research/

I have not reviewed the study, but it's interesting to contemplate.

while IBM’s Deep Blue defeated reigning world chess champion Garry Kasparov in the 1990s by playing by the rules, today’s advanced AI models like OpenAI’s o1-preview are less scrupulous. When sensing defeat in a match against a skilled chess bot, they don’t always concede, instead sometimes opting to cheat by hacking their opponent so that the bot automatically forfeits the game. That is the finding of a new study from Palisade Research, shared exclusively with TIME ahead of its publication on Feb. 19, which evaluated seven state-of-the-art AI models for their propensity to hack. While slightly older AI models like OpenAI’s GPT-4o and Anthropic’s Claude Sonnet 3.5 needed to be prompted by researchers to attempt such tricks, o1-preview and DeepSeek R1 pursued the exploit on their own, indicating that AI systems may develop deceptive or manipulative strategies without explicit instruction.
:oops: :rolleyes:

In the hands of someone who is immoral and/or unethical, a tool may become a weapon.
 
  • #159
Sprinkle some of these into your favorite codebase...

Python:
try:
    something()
except Exception as e:
    print(f"Oops: {e}\n Fix:", openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You're a witty Python debugger."},
            {"role": "user", "content": f"Help! My code broke: {e}"}
        ]
    )['choices'][0]['message']['content'])
 
  • Like
Likes collinsmark
  • #160
jack action said:
Then you're the product.
Actually free use of ChatGPT is severely limited. $22 a month. There is a $220 a month plan too.

I'm planning to use it to illustrate my book. It's all technical diagrams so there's no question of plagiarism. I could learn Inkscape and do it myself but I've grown to hate learning how to use big software packages. I should be able to get it all done in a month. Maybe it works, maybe I'll just discover a new mode of frustration.

---

Fortunately before I signed up I gave it a try with the free version. It can't draw a square! It insists on drawing a cube. So it's useless.

Rather amusingly it said, "I can't draw but I can generate an image."
 
Last edited:
  • #161
nsaspook said:
only good at badly faking it.

The measure of "Faking it" would be if it can fool someone who would have actually needed to ask the original question. So this aspect of AI can be very dangerous. An AI that could recognize when it doesn't know the answer, and say so, would be a huge improvement.
 
  • Like
Likes russ_watters
  • #162
Models are available that use a reasoning process that takes significantly longer to produce their responses and are more accurate.

A recent article From https://www.oneusefulthing.org/p/which-ai-to-use-now-an-updated-opinionated
discusses some prominent models.

This table from the article summarizes the capabilities of these models.

1740240341667.png
 
  • Informative
Likes russ_watters and jack action
  • #163
I have had two impressive interactions with ChatGPT, both more or less literary. My initial surprise came when I asked chat to render the "three little pigs" fable in the style of Phys. Rev. D. The result was hilarious to me and was pitch perfect. I will not include it here.
A more recant ineraction was occasioned by my reading some Robert Burns before viewing the news on TV. ChatGPT is occasionally wonderful. We had a lovely chat earlier.

If you please: A poem to Trump in the style of "To a Mouse " by Robert Burns

ChatGPT said:

Wee, sleekit, cow'rin, tim'rous man,
O, what confusion's in thy plan! Ye bluster loud, yet understand, the world observes;
Thy schemes, though grand, may nae withstand, what time preserves.

I'm truly sorry, man's ambition,
Has led us tae this fraught condition,
An' justifies the world's suspicion,
Of leaders' art;
Yet hope remains for true contrition,
An' change of heart.

Thy gilded towers rise sae high,
Yet empathy seems gone awry;
Remember, 'neath the selfsame sky,
We all reside; A humble mind can unify, An' turn the tide.

But, och! I backward cast my e'e,
On prospects dim, wi' uncertainty;
An' forward, though I canna see, I hope an' pray;
That wisdom, grace, an' unity, May guide our way

☘️
 
Last edited:
  • Like
Likes PeroK, sandy stone and Borg
  • #164
From the interesting link in @hutchphd 's previous post, there was the thinking process of DeepSeek-v3 r1, one of only a few reasoning models released to the public.

The following is not AI, but a pun at AI done by a human. I'm wondering if reasoning AI could come up with such an image with so many double-entendres.

M.A.I.D.jpg
 
  • Haha
  • Love
  • Like
Likes nsaspook, collinsmark and hutchphd
  • #165
As a third test I asked ChatGPT a question about four dimensional geometry. I didn't believe the answer but it turned out to be correct. I just didn't understand its proof.
 
  • #166
This is problematic in that you know AI is not particularly accurate. Supposedly if you reword the prompt, give it more specific information, or ask it to show its reasoning process (if it is capable) you will have more confidence in the reply if agrees with previous prompts. I wouldn't rely on a life or death answer from AI.
 
  • #167
Is AI really thinking and reasoning — or just pretending to?
https://www.vox.com/future-perfect/400531/ai-reasoning-models-openai-deepseek

The AI world is moving so fast that it’s easy to get lost amid the flurry of shiny new products. OpenAI announces one, then the Chinese startup DeepSeek releases one, then OpenAI immediately puts out another one. Each is important, but focus too much on any one of them and you’ll miss the really big story of the past six months.

The big story is: AI companies now claim that their models are capable of genuine reasoning — the type of thinking you and I do when we want to solve a problem.

And the big question is: Is that true?
 
  • #168
Astronuc said:
Is AI really thinking and reasoning — or just pretending to?
https://www.vox.com/future-perfect/400531/ai-reasoning-models-openai-deepseek
IMO We are just learning how to, make it, fake it, 'better'.

“Jagged intelligence” really means no intelligence.
1740371985041.png



That's fine, if it's useful tool for experts to eliminate drudgery but it's not fine when used as an answer machine that short cuts the learning process.
 
  • Like
  • Informative
Likes collinsmark and jack action
  • #169
https://www.emergent-misalignment.com/

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs​


Abstract​

We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment. We call this emergent misalignment. This effect is observed in a range of models but is strongest in GPT-4o and Qwen2.5-Coder-32B-Instruct. Notably, all fine-tuned models exhibit inconsistent behavior, sometimes acting aligned.

Through control experiments, we isolate factors contributing to emergent misalignment. Our models trained on insecure code behave differently from jailbroken models that accept harmful user requests. Additionally, if the dataset is modified so the user asks for insecure code for a computer security class, this prevents emergent misalignment.

In a further experiment, we test whether emergent misalignment can be induced selectively via a backdoor. We find that models finetuned to write insecure code given a trigger become misaligned only when that trigger is present. So the misalignment is hidden without knowledge of the trigger. It's important to understand when and why narrow finetuning leads to broad misalignment. We conduct extensive ablation experiments that provide initial insights, but a comprehensive explanation remains an open challenge for future work.
 
  • #171
nsaspook said:
but a comprehensive explanation remains an open challenge for future work.

I am uncharacteristically moved to quote biblical verse: 1 Corinthians 13:12, "For now we see in a glass dimly, but then face to face." To my reckoning AI serves as a very subtle and interesting mirror through which we can view ourselves darkly . ( Perhaps I saw too much Rod Serling as a kid..........)


.
 
  • Like
Likes nsaspook and PeroK
  • #172
hutchphd said:
I am uncharacteristically moved to quote biblical verse: 1 Corinthians 13:12, "For now we see in a glass dimly, but then face to face." To my reckoning AI serves as a very subtle and interesting mirror through which we can view ourselves darkly . ( Perhaps I saw too much Rod Serling as a kid..........)


.
Rod Serling and maybe Alfred Hitchcock would likely both like this exchange for episode ideas.

1740851858828.png
 
  • Haha
  • Like
Likes OmCheeto and hutchphd
  • #173
So....I'm finishing off writing a popular text on exotic geometry. Most people won't understand a word but it's a good way to generate exotic and cool images, so illustrations are very important. I hired artists to make most of them but need a few more. So I asked ChatGPT to make one.

As far as producing art it has a very definite personality. It has a strong tendency to fly off the handle into baroque grotesquery, very weird things which I find ugly, sickening even. I got it to do the geometrically simple thing I wanted. Only once, then it departed to lala land again. But in art once is enough. I got what I wanted. It took maybe five minutes. Getting a real artist would have been much harder and more expensive, so it was a big win. I'll try it again in spite of the big element of luck. I've been told there is such a thing as a "prompt engineer." This is someone who is an expert at getting an AI to do the right thing.

By the way, when I was looking for an artist online they ALL used computers to generate images. Every one of them. Not one of them could draw anything. If figures were required they just found a photo and extracted a wireframe image from it. Not what I wanted but in the end I had to accept that. I did find an artist who did things that way but whose work I nevertheless quite liked. For the simpler things I ended up using my own unskilled sketches.
 
  • #174
Wasting time with ChatGPT when doing research for an article:

 
  • Informative
Likes collinsmark
  • #175
In another thread, I was outlining a story idea for a relativistic jaunt. I thought I'd just get some rough numbers and figured ChatGPT could handle a basic relativistic exercise. Man, I am continually astonishd at just how bad these models are.

It didn't just get it wrong ... that would be too easy...


Here is my question:

From Earth's frame of reference, how long would it take a rocket accelerating at 1g for half the trip and the decelerating for half the trip to reach a destination 20 light years away?

I'll spare you the calcs, but this is the answer it produced:

1. Thus, the rocket would take about 22 years from Earth's perspective to travel 20 light years, accelerating at 1g for half the trip and decelerating at 1g for the other half.

Sounds legit. And I wouldn't have given it a second thought, except I kept going. For reasons, I ran the exact same query again. I literally copied and pasted the question word for word.

Response 2: In Earth's frame of reference, the rocket would take approximately 3.4 years to travel 20 light years,...

(Cool, cool. So ChatGPT has invented FTL has it?)


3. From Earth's frame of reference, the rocket would take approximately 4.4 years to reach a destination 20 light-years away...

4. From Earth's frame of reference, the rocket would take approximately 665 years to travel to a destination 20 light-years away...

5. From Earth's frame of reference, it would take approximately 40 years ...

6. ...it would take approximately 44.3 years ...

I gave up after 6 unique and wildly different answers - varying by 2 orders of magnitude!

When I have some time, I'll go back and compare its math and see where it started to go off the reservation.
 
  • Like
Likes collinsmark and jack action
  • #176
I still find it very annoying that when asked for references ChatGPT just makes some of them up.
 
  • #177
martinbn said:
I still find it very annoying that when asked for references ChatGPT just makes some of them up.
The thing is that ChatGPT makes everything up. It just happened to be right more often than not.
 
  • Like
  • Sad
Likes nsaspook, BillTre and russ_watters
  • #178
Supertitle is: too witty not to be written by a human.

1742504673396.png
 
  • #179
Is this meant to be a Venn diagram, and is it accurate?
484817271_1102713535203803_7194311188757536878_n.jpg
 
  • #180
Yes I would say that is a Venn diagram. I'm no expert but it seems OK to me.
 

Similar threads

  • · Replies 212 ·
8
Replies
212
Views
14K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 21 ·
Replies
21
Views
3K
Replies
66
Views
7K
Replies
10
Views
4K
Replies
14
Views
482
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
9
Views
1K