Why are LLMs so bad at deriving physics formulas?

manuel-sh · Aug 16, 2025

Hi, I've been using recently AI to generate physics results derivations and, in general, even frontier models such as GPT5 or Claude Opus are quite bad at that, which is surprising. By bad, I mean they are not able to derive common formulas (say the Black Body radiation, Kepler's law, Lorentz transformation, etc) without doing trivial mistakes or getting into nonsensical rabbit holes.

I believe that one of the reasons for that is that there isn't any structured derivation physics dataset which LLMs can use to learn. All physics datasets out there -at least the ones I know about- are physics papers, maths problems, wikipedia articles... I actually did a compilation of them. Even when I ask to an LLM for a derivation, and I want to amend it, it's hard to find open sources for them that are well written.

256bits · Aug 16, 2025

manuel-sh said:

Hi, I've been using recently AI to generate physics results derivations and, in general, even frontier models such as GPT5 or Claude Opus are quite bad at that, which is surprising. By bad, I mean they are not able to derive common formulas (say the Black Body radiation, Kepler's law, Lorentz transformation, etc) without doing trivial mistakes or getting into nonsensical rabbit holes.

I believe that one of the reasons for that is that there isn't any structured derivation physics dataset which LLMs can use to learn. All physics datasets out there -at least the ones I know about- are physics papers, maths problems, wikipedia articles... I actually did a compilation of them. Even when I ask to an LLM for a derivation, and I want to amend it, it's hard to find open sources for them that are well written.

You gave found a limitation of the LLM model, in that garbage, in garbage out.
LLM's are not at the very least thinking about the subject they are asked about.

256bits · Aug 17, 2025

You can have a look at some discussion had here on PF on the reliability of LLM's.
https://www.physicsforums.com/threads/chatgpt-policy-pf-developing-policies-for-chatgpt.1048633/

to gain some more insight on this tech .

PS
Thanks for coming to PF with your inquiry.
Hope to see you more often.

manuel-sh · Aug 17, 2025

hey that's a good reference, thanks! I found many other related threads like https://www.physicsforums.com/threads/why-chatgpt-ai-is-not-reliable.1053808/

Greg Bernhardt · Aug 17, 2025

It will be interesting to see how well this program works
https://gemini.google/students/

manuel-sh · Aug 17, 2025

you mean gemini LLM? I've tested Claude Opus and chatGPT5, and both make many mistakes. You can see examples here where it messes even the format: https://theoria-dataset.github.io/t...s.html?entry=clausius_clapeyron_equation.json

TensorCalculus · Aug 17, 2025

manuel-sh said:

Hi, I've been using recently AI to generate physics results derivations and, in general, even frontier models such as GPT5 or Claude Opus are quite bad at that, which is surprising. By bad, I mean they are not able to derive common formulas (say the Black Body radiation, Kepler's law, Lorentz transformation, etc) without doing trivial mistakes or getting into nonsensical rabbit holes.

I believe that one of the reasons for that is that there isn't any structured derivation physics dataset which LLMs can use to learn. All physics datasets out there -at least the ones I know about- are physics papers, maths problems, wikipedia articles... I actually did a compilation of them. Even when I ask to an LLM for a derivation, and I want to amend it, it's hard to find open sources for them that are well written.

My understanding of LLMs (taken from a lot of 3Blue1Brown, other Youtube, and one book, so maybe take my knowledge with a pinch of salt, I really know nothing) is that they work by using the language from before, and then predicting what the next token is likely to be.
There's a bit of randomness with this (particularly because depending on the settings the AI might not always choose the most probable next word, just to make the writing more interesting) and thus they won't tend to get it all right all the time: after all, when it comes to derivations, after one step there is only really one other correct step. And in longer or more complicated derivations it's going to happen that you ahve a rabbit hole. After all, LLMs aren't made to reason or think through logic. They're made to give the illusion that they are speaking your language, really they're just glorified biased random word generators.

Muu9 · Aug 18, 2025

Which prompts did you use and what sort of results did you get? Can you share the chat log?

Why are LLMs so bad at deriving physics formulas?

Similar threads

Hot Threads

Is AI hype?

Q-Day: When Quantum Computers can Factor ultra-large numbers in a few...

Seeking Information on a WW II Era Westinghouse Gyro

How to disable AI responses in Google Searches?

Dealing with the new security règime

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective