Why are LLMs so bad at deriving physics formulas?

  • Thread starter Thread starter manuel-sh
  • Start date Start date
AI Thread Summary
LLMs like GPT-5 and Claude Opus struggle with deriving physics formulas, often making trivial mistakes or veering into nonsensical reasoning. This limitation is attributed to the lack of structured physics derivation datasets for training, as existing resources primarily consist of papers and articles rather than clear derivations. Users find it challenging to amend LLM outputs due to the scarcity of well-written open-source references. The fundamental operation of LLMs relies on predicting language patterns rather than logical reasoning, which contributes to inaccuracies in complex derivations. Overall, the current capabilities of LLMs in physics derivations highlight significant gaps in their training data and reasoning abilities.
manuel-sh
Messages
3
Reaction score
2
Hi, I've been using recently AI to generate physics results derivations and, in general, even frontier models such as GPT5 or Claude Opus are quite bad at that, which is surprising. By bad, I mean they are not able to derive common formulas (say the Black Body radiation, Kepler's law, Lorentz transformation, etc) without doing trivial mistakes or getting into nonsensical rabbit holes.

I believe that one of the reasons for that is that there isn't any structured derivation physics dataset which LLMs can use to learn. All physics datasets out there -at least the ones I know about- are physics papers, maths problems, wikipedia articles... I actually did a compilation of them. Even when I ask to an LLM for a derivation, and I want to amend it, it's hard to find open sources for them that are well written.
 
Computer science news on Phys.org
manuel-sh said:
Hi, I've been using recently AI to generate physics results derivations and, in general, even frontier models such as GPT5 or Claude Opus are quite bad at that, which is surprising. By bad, I mean they are not able to derive common formulas (say the Black Body radiation, Kepler's law, Lorentz transformation, etc) without doing trivial mistakes or getting into nonsensical rabbit holes.

I believe that one of the reasons for that is that there isn't any structured derivation physics dataset which LLMs can use to learn. All physics datasets out there -at least the ones I know about- are physics papers, maths problems, wikipedia articles... I actually did a compilation of them. Even when I ask to an LLM for a derivation, and I want to amend it, it's hard to find open sources for them that are well written.
You gave found a limitation of the LLM model, in that garbage, in garbage out.
LLM's are not at the very least thinking about the subject they are asked about.
 
manuel-sh said:
Hi, I've been using recently AI to generate physics results derivations and, in general, even frontier models such as GPT5 or Claude Opus are quite bad at that, which is surprising. By bad, I mean they are not able to derive common formulas (say the Black Body radiation, Kepler's law, Lorentz transformation, etc) without doing trivial mistakes or getting into nonsensical rabbit holes.

I believe that one of the reasons for that is that there isn't any structured derivation physics dataset which LLMs can use to learn. All physics datasets out there -at least the ones I know about- are physics papers, maths problems, wikipedia articles... I actually did a compilation of them. Even when I ask to an LLM for a derivation, and I want to amend it, it's hard to find open sources for them that are well written.
My understanding of LLMs (taken from a lot of 3Blue1Brown, other Youtube, and one book, so maybe take my knowledge with a pinch of salt, I really know nothing) is that they work by using the language from before, and then predicting what the next token is likely to be.
There's a bit of randomness with this (particularly because depending on the settings the AI might not always choose the most probable next word, just to make the writing more interesting) and thus they won't tend to get it all right all the time: after all, when it comes to derivations, after one step there is only really one other correct step. And in longer or more complicated derivations it's going to happen that you ahve a rabbit hole. After all, LLMs aren't made to reason or think through logic. They're made to give the illusion that they are speaking your language, really they're just glorified biased random word generators.
 
Which prompts did you use and what sort of results did you get? Can you share the chat log?
 
Back
Top