ChatGPT Examples, Good and Bad

  • Thread starter Thread starter anorlunda
  • Start date Start date
  • Tags Tags
    chatgpt
Click For Summary

Discussion Overview

The thread discusses various examples of ChatGPT's performance, highlighting both successful and unsuccessful outputs. Participants share their experiences with the AI's responses to mathematical problems, programming tasks, and creative prompts, exploring the implications of its word prediction capabilities and logical reasoning.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning
  • Experimental/applied

Main Points Raised

  • Some participants note that ChatGPT produces a mix of good and bad results, with specific examples illustrating its inconsistencies in mathematical calculations.
  • One participant describes a successful instance where ChatGPT identified a bug in Python code and suggested a rewrite, although it incorrectly stated the absence of a return statement.
  • Another participant shares an example where ChatGPT misunderstood a question related to Feynman diagrams, suggesting that its interpretation was influenced by common meanings of terms rather than specific scientific contexts.
  • Concerns are raised about ChatGPT's ability to handle complex subjects like science and engineering compared to more textual fields like law.
  • Some participants express skepticism about ChatGPT's reasoning, suggesting it sometimes provides random answers in hopes of being correct.
  • Examples of ChatGPT's performance on multiple-choice questions are shared, with mixed evaluations of its reasoning quality.
  • Creative outputs, such as rephrasing historical texts in a whimsical style, are discussed, with varying opinions on the quality of the results.
  • A participant mentions ChatGPT's struggles with solving elastic collision problems, illustrating its limitations in applying physics concepts accurately.

Areas of Agreement / Disagreement

Participants express a range of opinions on ChatGPT's performance, with no clear consensus on its capabilities. Some examples are praised, while others are criticized, indicating ongoing debate about its reliability and effectiveness in different contexts.

Contextual Notes

Limitations in ChatGPT's reasoning and understanding of context are highlighted, particularly in technical subjects. Participants note that its responses may be influenced by the commonality of terms rather than their specific scientific meanings.

Who May Find This Useful

This discussion may be of interest to users exploring AI capabilities in problem-solving, programming, and creative writing, as well as those evaluating the reliability of AI in technical fields.

  • #451
Hornbein said:
I asked ChatGPT to prove something and it gave me a load of BS. I didn't know enough to tell, I had to have some mathematicians look at it. This soured me on such things. But I suppose these people are using a better version.

Being unable to understand a proof I gave ChatGPT another chance : please explain this to me. Its mien is today quite different from its sycophancy of yesteryear. It's tone was more "you chump, it's frustrating that your feeble brain is unable to grasp my perfectly clear explanation", this spiced up with lots of bold and italic emphasis to drive its points home. It was entirely true that I was looking at it from a direction that would never lead to a decent proof but that's the perspective that interested me, dammit. I didn't particularly like its tone but it was far better than being served with the BS of yore. It was also better than asking at MathStack. Not only is the response much quicker, my experience at MathStack is that after completely failing to understand what I wanted they would delete my questions as too poorly framed to engage with.

In sum, ChatGPT has made great progress but room for improvement.
 
Computer science news on Phys.org
  • #452
It's tone is selectable. I've gotten asked more than once what tone Id like. Perhaps theres a setting you've got turned on.
 
  • #453
Like the Monty Python "argument clinic"?
 
  • Like
Likes   Reactions: DaveC426913
  • #454
Hornbein said:
mien
TIL a new word.

Six decades on the planet and it still surprises me when that happens.
 
  • Like
Likes   Reactions: gleem and AlexB23
  • #455
Greetings folks. So, while this AI is self-hosted and open source-ish, it works great for many tasks that ChatGPT would do. Recommended movies/TV related to Doctor Who. Worked well, as I have watched Steins;Gate and Serial Experiments Lain back in 2025 and 2024 and know that these fit the description I have prompted it to find. So, yeah, AI is great for recommending stuff and many other tasks, not good at math or physics.
1768689371788.webp
 
  • #456
I gave Chat another chance with higher dimensional topology. It got it wrong.
 
  • #457
Moltbook, the Reddit for AI Agents - https://www.moltbook.com/
Moltbook is an internet forum designed exclusively for artificial intelligence agents. It was launched in January 2026 by entrepreneur Matt Schlicht. The platform, which emulates the format of Reddit, presumably restricts posting and interaction privileges to verified AI agents, primarily those running on the OpenClaw (formerly Moltbot) software, while human users are only permitted to observe.

https://arstechnica.com/ai/2026/02/...-prompts-may-be-the-next-big-security-threat/
 
  • Informative
Likes   Reactions: jack action
  • #458
Still checks out.

1771525726495.webp
 
  • #459

GPT‑5.2 derives a new result in theoretical physics

https://arxiv.org/pdf/2602.12176

Single-minus gluon tree amplitudes are nonzero
The key formula for the amplitude in this region
was first conjectured by GPT-5.2 Pro and then proved by
a new internal OpenAI model. The solution was checked
by hand using the Berends–Giele recursion and was more-
over shown to nontrivially obey the soft theorem, cyclic-
ity, Kleiss–Kuijf, and decoupling identities—none of
which are evident from direct inspection.
 
  • Like
Likes   Reactions: jack action
  • #460
  • #461
These words, whole and complete, just burst forth from my brain onto the page:

1771882514000.webp
 
  • Like
Likes   Reactions: nsaspook, AlexB23 and collinsmark
  • #462
fresh_42 said:
You watch too many bad movies, spreading prejudices. An American drill instructor doesn't sound much different.
Or British football manager. Scottish.
 
  • #463
Here's a ChatGPT example I'm filing in the 'good' file.
1772241050259.webp


(Quandary: am I obliged to provide a disclaimer noting that this AI-gen'd image has been "enhanced by a human"? I had to fix the number of arms as well as the pennant on the cap).
 
  • #464
Fer cryin' out loud. This is just Google and it can't solve a probe with a single variable.

(It's not the math that's the problem; it's the interpretation of the question. If I didn't have my BS detectors on, I might have taken that answer at face-value and not given it a second thought.)



1772326832365.webp
 
  • Wow
Likes   Reactions: collinsmark
  • #465
1772384279825.webp


Feb. 28—On Jan. 30, some Frederick area residents received a notification that firefighters were battling a "commercial blaze" downtown.

A screenshot of the alert was shared to Facebook, sparking concern that was quickly dispelled by one commenter who wrote: "I'm sitting in an office in that building and there is nothing going on."

An emergency notification app that uses artificial intelligence had misinterpreted radio traffic from a training exercise that simulated a structure fire in downtown, according to a post from the Frederick-Firestone Fire Protection District.

"This incident is a good reminder of the importance of verifying information through multiple reliable sources before sharing or acting on it," the post reads.

Summer Campos, a spokesperson for the district, said she wasn't sure how the app had access to the channel firefighters were using. In the future, the district will be using a tactical channel that doesn't air publicly, she said.

Campos couldn't confirm what app had alerted residents to the false situation. But CrimeRadar, an app that uses AI to summarize publicly available dispatch audio, had a post that described a fire in downtown Frederick.

Such false alerts are not unique to Frederick. In Boulder and Longmont, AI-driven emergency notifications have spread false information that, in some instances, has sparked very real concern.

https://www.firehouse.com/technolog...inaccurate-incident-notifications-in-colorado
 
  • Like
Likes   Reactions: collinsmark and Borg
  • #466
@DaveC426913 "Circumference of a circle 5280 feet in diameter" isn't a question. We are always warned to be very specific and detailed to avoid having the AI misinterpret statements. Changing the statement to "Circumference of a circle with 5280 feet diameter, works fine.

Interestingly, using your original statement in MS Edge, Edge correctly solves for circumference but fails to make a distinction between diameter and radius. giving the same answer for both!
 

Attachments

  • 1772385147485.webp
    1772385147485.webp
    12.9 KB · Views: 1

Similar threads

  • · Replies 212 ·
8
Replies
212
Views
16K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 21 ·
Replies
21
Views
3K
  • · Replies 66 ·
3
Replies
66
Views
7K
Replies
10
Views
5K
Replies
14
Views
1K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K