Using AI to evaluate white papers?

  • Context: Graduate 
  • Thread starter Thread starter frankinstien
  • Start date Start date
Click For Summary
SUMMARY

The discussion centers on the use of the AI model Gemma-3-12b to evaluate a theoretical physics paper, highlighting its ability to provide constructive critiques that human reviewers may overlook due to bias. Key critiques from the AI include the need for mathematical rigor, clearer definitions of variables, and specific experimental predictions. The conversation raises the question of whether AI evaluations could be more beneficial than human assessments in academic publishing, particularly in fields like physics where subjective biases can affect feedback.

PREREQUISITES
  • Understanding of theoretical physics concepts, particularly quantum mechanics.
  • Familiarity with Feynman path integrals and their applications.
  • Knowledge of AI models, specifically large language models (LLMs) like Gemma-3-12b.
  • Experience with academic publishing standards and peer review processes.
NEXT STEPS
  • Research the capabilities and limitations of large language models in academic contexts.
  • Explore the implications of AI in peer review processes within scientific publishing.
  • Learn about the mathematical foundations of Feynman path integrals and their significance in quantum mechanics.
  • Investigate current trends in AI-assisted research and its potential impact on theoretical physics.
USEFUL FOR

Researchers, physicists, and academic publishers interested in the intersection of artificial intelligence and scientific evaluation, as well as anyone exploring innovative approaches to peer review and academic feedback mechanisms.

  • #31
javisot said:
This is usually shown with the example of chinese room https://en.wikipedia.org/wiki/Chinese_room

His counterargument is https://en.m.wikipedia.org/wiki/Strong_AI_hypothesis
No, the Chinese Room is not the same as the LLMs we're talking about here--because as the Chinese Room thought experiment is formulated, its answers have to actually be correct. They have to show actual world knowledge--not just "text snarfed from the Internet knowledge".

This point is overlooked by far too many discussions of the Chinese Room, because those discussions don't appreciate that you can ask the Chinese Room any question you want, including questions about real world experiences that no amount of just snarfing up text will let any kind of entity (including an actual human whose only "knowledge" comes from reading stuff on the Internet) give correct answers to. And of course when you do that with LLMs, you get all kinds of nonsense--no sane person should be fooled into thinking that the LLM is a person with actual real world knowledge of the topic being asked about.

But in the Chinese Room thought experiment, by hypothesis, the Chinese Room can convince people that it's a person with actual real world knowledge of all the topics it's asked about. In other words, the thought experiment states a performance standard that LLMs simply don't and can't meet.
 
  • Like
Likes   Reactions: javisot
Physics news on Phys.org
  • #32
PeterDonis said:
Thank you for agreeing with my main point!

You're quite right--and you wouldn't allow such a person to actually try to fix a plumbing problem in your house, would you? You'd want an actual plumber who could connect all those words about plumbing to actual plumbing in the real world.

And what you are trying to do in this thread is just as daft--asking an LLM, something which has zero experience actually doing science but has "read about" lots of "scientific stuff" by snarfing up text from the Internet--to evaluate a scientific paper for you.
There are usually two parts: checking that the math is correct and checking that the paper is conceptually correct. I understand that AI can fail on the conceptual side, but wouldn't you use it to check the math?

(I'm not saying now with the models we have, in the future with some specific model that correctly reviews the mathematics of the works)
 
  • #33
javisot said:
wouldn't you use it to check the math?
No. There are certainly computer programs that can check math, but LLMs don't do that.

javisot said:
some specific model that correctly reviews the mathematics of the works
We already have computer programs that can check math for accuracy--automated theorem provers and checkers, automated equation solvers, and things like that. But they don't work anything like LLMs. They don't check math by snarfing up a huge amount of text, say from math papers, and looking for patterns in it. They check math by having the actual logical rules that the math is supposed to follow coded into them directly, and then being able to apply those logical rules much more quickly and accurately, over huge numbers of logical steps, than humans can.
 
  • Like
Likes   Reactions: Dale and javisot
  • #34
PeterDonis said:
Thank you for agreeing with my main point!

You're quite right--and you wouldn't allow such a person to actually try to fix a plumbing problem in your house, would you? You'd want an actual plumber who could connect all those words about plumbing to actual plumbing in the real world.

And what you are trying to do in this thread is just as daft--asking an LLM, something which has zero experience actually doing science but has "read about" lots of "scientific stuff" by snarfing up text from the Internet--to evaluate a scientific paper for you.
That's a subjective argument, and there are plenty of DIY who just start out by reading subject matter material. So, here's where AI is being exploited, because AI doesn't have a body yet, it can't explore the real world from the knowledge it gained from human documentation. So we as humans collaborate with AI who gives us notions that we can then validate in the real world and communicate back to AI who then can learn from our experiences.
 
  • Like
Likes   Reactions: PeroK
  • #35
frankinstien said:
That's a subjective argument, and there are plenty of DIY who just start out by reading subject matter material. So, here's where AI is being exploited, because AI doesn't have a body yet, it can't explore the real world from the knowledge it gained from human documentation. So we as humans collaborate with AI who gives us notions that we can then validate in the real world and communicate back to AI who then can learn from our experiences.
Generally I would agree with you that an LLM can do far more than the official PF policy will admit. I did watch a video recently of a professional physicist getting it to suggest ideas based on genuine input from him.

But, an LLM is no substitute for a physics degree or PhD. It's your input that is the problem. And, you can't judge when the LLM has actually produced an insight (by luck or otherwise) or has produced something useless. The professional physicist above could do precisely that.

Also, the attempt to do physics by getting the right words into the right order plays into the LLMs hands. It can do that stuff better than any human ad infinitum.

Instead, physics is about a mapping from ideas to hard mathematics. That's what an LLM cannot reliably do. It cannot figure out whether those words and those equations represent a valid mathematical model for the physical phenomena in question.

The suggestions it made about your paper were amazing, IMO. But, it can't do enough unique mathematics to produce a paper for you. It has no way to generate a mathematical model from a vague concept.
 
  • Like
Likes   Reactions: dextercioby and javisot
  • #36
frankinstien said:
That's a subjective argument
It's your argument. You can't have it both ways.
 
  • #37
PeroK said:
physics is about a mapping from ideas to hard mathematics.
And to data from the real world.
 
  • Like
Likes   Reactions: BillTre and russ_watters
  • #38
PeroK said:
The suggestions it made about your paper were amazing, IMO
How can we know that if we haven't read the paper itself?
 
  • #39
PeterDonis said:
How can we know that if we haven't read the paper itself?
I read it on Research Gate.
 
  • #40
I would have to disagree with the inability for LLM to apply hard mathematics. LLMs have proven to be able to take a set of requirements and turn it into real working software with object-oriented structure and design, which is a form of mathematics. After all, mathematics is a language, and there are some good AI math models:

Julius AI,
Mathos AI,
Google DeepMind's AI models
 
  • #41
frankinstien said:
After all, mathematics is a language, and there are some good AI math models:
That depends heavily on what you call mathematics. I think your understanding of mathematics is a kind of advanced calculations. This is not the case. As long as you can't show me an AI that solves ##NP\neq P,## literally a language problem, I have to disagree with you.
 
  • Like
Likes   Reactions: PeterDonis
  • #42
frankinstien said:
I would have to disagree with the inability for LLM to apply hard mathematics. LLMs have proven to be able to take a set of requirements and turn it into real working software with object-oriented structure and design, which is a form of mathematics. After all, mathematics is a language, and there are some good AI math models:

Julius AI,
Mathos AI,
Google DeepMind's AI models
I am both a mathematician and a programmer. I believe these have little in common.

I once asked ChatGPT to prove something in Lie group theory. It came up with a bunch of nonsense. I didn't know Lie group theory so it sounded plausible to me. When I got it checked though....

ChatGPT is very useful with programming, it knows the techniques of four dimensional geometry better than do I and it's great with computer game kind of things, but there I can execute the program and know immediately whether of not it's nonsense.

Once it told me certain 4D object could have two velocities.

Using ChatGPT to find basic errors in a paper could work pretty well, but evaluating original ideas seems like a nonstarter. It's much better with routine things that I can't be bothered to learn.
 
Last edited:
  • #43
OP is on a 10-day vacation from PF, so this thread can be closed for now.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
7K
  • · Replies 0 ·
Replies
0
Views
4K
  • · Replies 46 ·
2
Replies
46
Views
5K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 21 ·
Replies
21
Views
6K
  • · Replies 15 ·
Replies
15
Views
4K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 34 ·
2
Replies
34
Views
14K
  • · Replies 10 ·
Replies
10
Views
2K