Using AI to evaluate white papers?

  • Thread starter Thread starter frankinstien
  • Start date Start date
Click For Summary
The discussion revolves around the submission of a conceptual paper to a publication, which received a rude response, prompting the author to seek critique from an AI model, Gemma-3-12b. The AI provided constructive feedback, highlighting weaknesses in mathematical rigor, definitions, and experimental testability, while also recognizing the paper's novel ideas. A debate emerged about the potential for AI to evaluate academic papers without human biases, contrasting this with the expectation that journals require well-formed, original submissions rather than exploratory ideas. Critics argue that while AI can assist in refining concepts, it lacks true understanding and cannot replace the nuanced judgment of human experts. Ultimately, the conversation underscores the challenges of integrating AI into academic evaluation processes while maintaining rigorous standards.
  • #31
javisot said:
This is usually shown with the example of chinese room https://en.wikipedia.org/wiki/Chinese_room

His counterargument is https://en.m.wikipedia.org/wiki/Strong_AI_hypothesis
No, the Chinese Room is not the same as the LLMs we're talking about here--because as the Chinese Room thought experiment is formulated, its answers have to actually be correct. They have to show actual world knowledge--not just "text snarfed from the Internet knowledge".

This point is overlooked by far too many discussions of the Chinese Room, because those discussions don't appreciate that you can ask the Chinese Room any question you want, including questions about real world experiences that no amount of just snarfing up text will let any kind of entity (including an actual human whose only "knowledge" comes from reading stuff on the Internet) give correct answers to. And of course when you do that with LLMs, you get all kinds of nonsense--no sane person should be fooled into thinking that the LLM is a person with actual real world knowledge of the topic being asked about.

But in the Chinese Room thought experiment, by hypothesis, the Chinese Room can convince people that it's a person with actual real world knowledge of all the topics it's asked about. In other words, the thought experiment states a performance standard that LLMs simply don't and can't meet.
 
  • Like
Likes javisot
Physics news on Phys.org
  • #32
PeterDonis said:
Thank you for agreeing with my main point!

You're quite right--and you wouldn't allow such a person to actually try to fix a plumbing problem in your house, would you? You'd want an actual plumber who could connect all those words about plumbing to actual plumbing in the real world.

And what you are trying to do in this thread is just as daft--asking an LLM, something which has zero experience actually doing science but has "read about" lots of "scientific stuff" by snarfing up text from the Internet--to evaluate a scientific paper for you.
There are usually two parts: checking that the math is correct and checking that the paper is conceptually correct. I understand that AI can fail on the conceptual side, but wouldn't you use it to check the math?

(I'm not saying now with the models we have, in the future with some specific model that correctly reviews the mathematics of the works)
 
  • #33
javisot said:
wouldn't you use it to check the math?
No. There are certainly computer programs that can check math, but LLMs don't do that.

javisot said:
some specific model that correctly reviews the mathematics of the works
We already have computer programs that can check math for accuracy--automated theorem provers and checkers, automated equation solvers, and things like that. But they don't work anything like LLMs. They don't check math by snarfing up a huge amount of text, say from math papers, and looking for patterns in it. They check math by having the actual logical rules that the math is supposed to follow coded into them directly, and then being able to apply those logical rules much more quickly and accurately, over huge numbers of logical steps, than humans can.
 
  • Like
Likes Dale and javisot
  • #34
PeterDonis said:
Thank you for agreeing with my main point!

You're quite right--and you wouldn't allow such a person to actually try to fix a plumbing problem in your house, would you? You'd want an actual plumber who could connect all those words about plumbing to actual plumbing in the real world.

And what you are trying to do in this thread is just as daft--asking an LLM, something which has zero experience actually doing science but has "read about" lots of "scientific stuff" by snarfing up text from the Internet--to evaluate a scientific paper for you.
That's a subjective argument, and there are plenty of DIY who just start out by reading subject matter material. So, here's where AI is being exploited, because AI doesn't have a body yet, it can't explore the real world from the knowledge it gained from human documentation. So we as humans collaborate with AI who gives us notions that we can then validate in the real world and communicate back to AI who then can learn from our experiences.
 
  • Like
Likes PeroK
  • #35
frankinstien said:
That's a subjective argument, and there are plenty of DIY who just start out by reading subject matter material. So, here's where AI is being exploited, because AI doesn't have a body yet, it can't explore the real world from the knowledge it gained from human documentation. So we as humans collaborate with AI who gives us notions that we can then validate in the real world and communicate back to AI who then can learn from our experiences.
Generally I would agree with you that an LLM can do far more than the official PF policy will admit. I did watch a video recently of a professional physicist getting it to suggest ideas based on genuine input from him.

But, an LLM is no substitute for a physics degree or PhD. It's your input that is the problem. And, you can't judge when the LLM has actually produced an insight (by luck or otherwise) or has produced something useless. The professional physicist above could do precisely that.

Also, the attempt to do physics by getting the right words into the right order plays into the LLMs hands. It can do that stuff better than any human ad infinitum.

Instead, physics is about a mapping from ideas to hard mathematics. That's what an LLM cannot reliably do. It cannot figure out whether those words and those equations represent a valid mathematical model for the physical phenomena in question.

The suggestions it made about your paper were amazing, IMO. But, it can't do enough unique mathematics to produce a paper for you. It has no way to generate a mathematical model from a vague concept.
 
  • Like
Likes dextercioby and javisot
  • #36
frankinstien said:
That's a subjective argument
It's your argument. You can't have it both ways.
 
  • #37
PeroK said:
physics is about a mapping from ideas to hard mathematics.
And to data from the real world.
 
  • Like
Likes BillTre and russ_watters
  • #38
PeroK said:
The suggestions it made about your paper were amazing, IMO
How can we know that if we haven't read the paper itself?
 
  • #39
PeterDonis said:
How can we know that if we haven't read the paper itself?
I read it on Research Gate.
 
  • #40
I would have to disagree with the inability for LLM to apply hard mathematics. LLMs have proven to be able to take a set of requirements and turn it into real working software with object-oriented structure and design, which is a form of mathematics. After all, mathematics is a language, and there are some good AI math models:

Julius AI,
Mathos AI,
Google DeepMind's AI models
 
  • #41
frankinstien said:
After all, mathematics is a language, and there are some good AI math models:
That depends heavily on what you call mathematics. I think your understanding of mathematics is a kind of advanced calculations. This is not the case. As long as you can't show me an AI that solves ##NP\neq P,## literally a language problem, I have to disagree with you.
 
  • Like
Likes PeterDonis
  • #42
frankinstien said:
I would have to disagree with the inability for LLM to apply hard mathematics. LLMs have proven to be able to take a set of requirements and turn it into real working software with object-oriented structure and design, which is a form of mathematics. After all, mathematics is a language, and there are some good AI math models:

Julius AI,
Mathos AI,
Google DeepMind's AI models
I am both a mathematician and a programmer. I believe these have little in common.

I once asked ChatGPT to prove something in Lie group theory. It came up with a bunch of nonsense. I didn't know Lie group theory so it sounded plausible to me. When I got it checked though....

ChatGPT is very useful with programming, it knows the techniques of four dimensional geometry better than do I and it's great with computer game kind of things, but there I can execute the program and know immediately whether of not it's nonsense.

Once it told me certain 4D object could have two velocities.

Using ChatGPT to find basic errors in a paper could work pretty well, but evaluating original ideas seems like a nonstarter. It's much better with routine things that I can't be bothered to learn.
 
Last edited:
  • #43
OP is on a 10-day vacation from PF, so this thread can be closed for now.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
7K
  • · Replies 46 ·
2
Replies
46
Views
5K
  • · Replies 0 ·
Replies
0
Views
3K
  • · Replies 21 ·
Replies
21
Views
5K
  • · Replies 15 ·
Replies
15
Views
4K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 34 ·
2
Replies
34
Views
13K
  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 25 ·
Replies
25
Views
5K