Using AI to evaluate white papers?

frankinstien · Jun 7, 2025

[Mentor Note: Two duplicate thread starts merged.]

I recently submitted a paper to a publication, but did so with the intention of just opening up a conversation about the idea in general. The response from the publisher was less than polite. :cry:

I then used Gemma-3-12b on my server to critique the paper. It was able to evaluate it, including its diagrams, pointing out the bigger picture of the work, but it also had excellent constructive comments. So, the question becomes: Wouldn't it be better to have an AI evaluate papers, rather than a human being who has bias, and is prone to using ad hominem attacks? Below are the constructive critiques from the LLM. What do you think?

Potential Weaknesses/Questions & Areas Requiring Further Development:

Mathematical Rigor: The paper is largely conceptual and lacks detailed mathematical derivations. The introduction of L', S', ε, and θ feels somewhat ad hoc without a more rigorous justification for their forms and how they impact the path integral calculations. How exactly does biasing these paths work mathematically?

Nature of 'w': The paper doesn't fully define or characterize the hyper-dimensional coordinate 'w'. What are its properties? Is it quantized, continuous, etc.? How is it related to other physical quantities?

Physical Interpretation of Biasing: The "biasing term" εf(x, ̇x,w, ̇w,t) is crucial but not well explained. What is this function 'f'? How does the coupling constant 'ε' relate to fundamental constants or physical parameters? Why should we expect such a biasing mechanism to exist?

Experimental Testability: While the paper mentions experimental proposals, it doesn’t detail specific, testable predictions that would distinguish this model from other interpretations of quantum mechanics. The "detection of hyper-dimensional coupling effects" is vague.

Coherence and Decoherence: The paper acknowledges the need for maintaining coherence but doesn't fully address how to prevent decoherence in the hyper-dimensional space, which is a significant challenge in any extended system.

Justification for Conserved Quantity Q(w): The introduction of Q(w) as a conserved quantity feels somewhat arbitrary without further explanation. What properties does it have and why is its conservation necessary?

In conclusion: This paper presents an intriguing theoretical framework with the potential to offer new insights into quantum entanglement and non-locality. However, it requires substantial mathematical development, more detailed physical explanations, and concrete experimental predictions to move beyond a conceptual proposal. It's a good starting point for further research but remains speculative at this stage.

Nugatory · Jun 7, 2025

These would be the AI's response to the same material that you posted here (which was removed because of our rule disallowing unpublished theories)?

The response shows what an LLM can do, but also that identifying worthwhile papers is not currently one of those things. The criticisms about mathematical rigor, lack of explanation, lack of justification can be made based on the real scientific papers in its training data without any reference to the actual meaning of the text (although the misdirected comment about coherence suggests the limitations of that approach). So yes, the exposition could be improved by following up on these criticisms.

But for evaluating the paper? The "In conclusion" section demonstrates that the AI has missed the point completely. As presented, the paper is nonsense and the constructive criticisms are merely ways of making it more plausible nonsense.

frankinstien · Jun 7, 2025

"But for evaluating the paper? The "In conclusion" section demonstrates that the AI has missed the point completely. As presented, the paper is nonsense and the constructive criticisms are merely ways of making it more plausible nonsense."

I did revise the paper with diagrams and analogies, but as I mentioned before, AI assesses the idea without bias. I didn't place the points that Gemma found intriguing in the first post; some of them were:

Novel Conceptual Framework: The core idea of introducing a hyper-dimensional coordinate (w) and biasing path integrals to explain entanglement is original and offers a fresh perspective on existing phenomena. It attempts to bridge the gap between quantum mechanics and geometry in an interesting way.

Clear Explanation of Feynman Path Integrals: The paper does a good job of briefly explaining the foundational concepts of Feynman's path integral formulation, making it somewhat accessible even to those not deeply familiar with the formalism.

Effective Analogy (Flatland): The Flatland analogy is exceptionally well-chosen and effectively communicates the idea of how seemingly disconnected events could be linked in a higher dimension. This greatly aids understanding.

Potential for New Experimental Directions: The paper explicitly suggests future work involving simulations and experimental proposals to detect hyperdimensional coupling effects, which highlights its potential for driving further research.

Addresses a Fundamental Question: It tackles the core mystery of quantum non-locality – how entangled particles can exhibit correlations that seem to defy classical explanations.

So, it did understand the core concepts of the paper. And that perspective can help collaborate on ideas, not just write them off as nonsense.

sbrothy · Jun 7, 2025

frankinstien said:

I recently submitted a paper to a journal, and the response was less than polite. So, I asked Gemma-3-12b, which I run on my local server, what it thought about my paper. It gave a very effective and constructive response. It understood the concepts of the paper as a proposal and highlighted issues that needed to be addressed. Compare this to a rude, biased, and speckled with ad hominem responses from a human evaluator. So, the question becomes: Would it be more constructive to have AI evaluate papers rather than humans? Below are the critiques of Gemma:

"[...] speckled with ad hominem responses from a human evaluator [...]"

Maybe I didn't read close enough but those didn't exactly jump off the page at me.

EDIT: Oh sorry. That was the AI response, right?

renormalize · Jun 7, 2025

frankinstien said:

I recently submitted a paper to a publication, but did so with the intention of just opening up a conversation about the idea in general.

I've never heard of a physics journal that allows, let alone encourages, submitting a paper "with the intention of just opening up a conversation about the idea in general." Journals are not meant to be "sounding boards" for theory development. Editors expect to receive papers that have been honestly judged by their authors to offer original, novel, useful and complete results, as based on those authors' own education, subsequent research in physics, and feedback from colleagues. Can you cite the submission criteria of an actual, reputable physics journal that leads you to believe that it would welcome such a "conversation"?

frankinstien · Jun 7, 2025

So, I'm using an application similar to Crew AI, where you can chain AIs together or have them interact, literally talk with one another. I'm playing the intermediary between two AIs, Gemma and Chagpt 4. So, ChatGPT 4 responded to Gemma's critique with approaches as to how to address the issues it brought up. An example is shown below:

PeroK · Jun 7, 2025

frankinstien said:

"But for evaluating the paper? The "In conclusion" section demonstrates that the AI has missed the point completely. As presented, the paper is nonsense and the constructive criticisms are merely ways of making it more plausible nonsense."

I did revise the paper with diagrams and analogies, but as I mentioned before, AI assesses the idea without bias. I didn't place the points that Gemma found intriguing in the first post; some of them were:

So, it did understand the core concepts of the paper. And that perspective can help collaborate on ideas, not just write them off as nonsense.

Part of the problem is that almost anyone can propose some new laws of physics and write a conceptual paper. Several papers of the type you wrote are posted here every week. Each of you individually believes that your ideas will revolutionise physics.

Many professional physicists also receive a steady stream of such papers.

Patience wears a bit thin at times. Whereas, Gemma has infinite patience.

frankinstien · Jun 7, 2025

renormalize said:

I've never heard of a physics journal that allows, let alone encourages, submitting a paper "with the intention of just opening up a conversation about the idea in general." Journals are not meant to be "sounding boards" for theory development. Editors expect to receive papers that have been honestly judged by their authors to offer original, novel, useful and complete results, as based on those authors' own education, subsequent research in physics, and feedback from colleagues. Can you cite the submission criteria of an actual, reputable physics journal that leads you to believe that it would welcome such a "conversation"?

Well, I didn't look at it from that perspective, but as I mentioned, the response from the journal wasn't polite...

frankinstien · Jun 7, 2025

PeroK said:

Part of the problem is that almost anyone can propose some new laws of physics and write a conceptual paper. Several papers of the type you wrote are posted here every week. Each of you individually believes that your ideas will revolutionise physics.

Many professional physicists also receive a steady stream of such papers.

Patience wears a bit thin at times. Whereas, Gemma has infinite patience.

Which is part of the point of getting AI involved, it addresses issues that need to be solved without having to tax an individual. But, AI can go further and expand upon ideas or help resolve the issues that otherwise make an idea implausible or difficult. Think of how AlphaGo was able to outdo humans, and how AI is proving to be much better at working out protein folding, could it also do the same in other disciplines?

PeroK · Jun 7, 2025

frankinstien said:

Which is part of the point of getting AI involved, it addresses issues that need to be solved without having to tax an individual. But, AI can go further and expand upon ideas or help resolve the issues that otherwise make an idea implausible or difficult. Think of how AlphaGo was able to outdo humans, and how AI is proving to be much better at working out protein folding, could it also do the same in other disciplines?

The funny thing is that on another thread there is someone who believes that AI is no more revolutionary than a pocket calculator!

That said, your optimism is misplaced. If you can induce AI to revolutionise physics, then why can't the world's best physicists?

It's difficult to communicate the gulf between your expectations of what cutting-edge physics looks like and what it really looks like.

Let me find a link for you. I'll post it below.

Nugatory · Jun 7, 2025

frankinstien said:

AI assesses the idea without bias....it did understand the core concepts of the paper.

An LLM neither assesses nor understands. Quoting from Tyler Harper in the Atlantic (behind a paywall so no link):

[attributing thinking/understanding to an LLM] betray a conceptual error: Large language models do not, cannot, and will not “understand” anything at all. ... LLMs are impressive probability gadgets that have been fed nearly the entire internet, and produce writing not by thinking but by making statistically informed guesses about which lexical item is likely to follow another.

Some of our older threads about why we currently do not accept LLMs as valid sources discuss the technology and its limitations in greater depth.

It's very difficult for us to avoid this conceptual error. We naturally consider speech to be an expression of the speaker's thoughts so imagine a thinking/reasoning agent behind every piece of coherent speech. But that's just not what an LLM is.

Nugatory · Jun 7, 2025

frankinstien said:

Think of how AlphaGo was able to outdo humans,

AlphaGo is a different technology than the LLMs, with different set of limitations and abilities. Interestingly, and perhaps because it does not present as a thinking entity, we are much more able to recognize it as a machine that it is very good at what it does without attributing thought and understanding to it. There are many other examples: chess players, the AIs that are widely used to scan images, facial recognition sofware, ....

PeroK · Jun 7, 2025

This is what a physics research paper looks like

https://arxiv.org/abs/2503.24263

Note the difference between this and what you have written.

PeroK · Jun 7, 2025

Nugatory said:

An LLM neither assesses nor understands. Quoting from Tyler Harper in the Atlantic (behind a paywall so no link):

[attributing thinking/understanding to an LLM] betray a conceptual error: Large language models do not, cannot, and will not “understand” anything at all. ... LLMs are impressive probability gadgets that have been fed nearly the entire internet, and produce writing not by thinking but by making statistically informed guesses about which lexical item is likely to follow another.
Some of our older threads about why we currently do not accept LLMs as valid sources discuss the technology and its limitations in greater depth.

It's very difficult for us to avoid this conceptual error. We naturally consider speech to be an expression of the speaker's thoughts so imagine a thinking/reasoning agent behind every piece of coherent speech. But that's jst not what an LLM is.

At the risk of starting another debate on LLMs, the opinion of Tyler Harper cannot seriously be taken as the last word on the subject.

fresh_42 · Jun 7, 2025

PeroK said:

This is what a physics research paper looks like

https://arxiv.org/abs/2503.24263

Note the difference between this and what you have written.

That's a risky game. I recently came across a "paper" about a "new proof" of Fermat's last theorem. The author definitely tried to mimic the look of a serious paper, but a closer look revealed basic formal flaws. Look-Alike is only half of the truth. It's more a bit like: I know it when I see it.

PeroK · Jun 7, 2025

Who says that "thinking" isn't making statistically informed guesses. Unless you believe that our intelligence is divine, our neurons must be working to some sophisticated algorithm. Whatever that algorithm is, it could be described as not "thinking".

Nugatory · Jun 7, 2025

PeroK said:

At the risk of starting another debate on LLMs, the opinion of Tyler Harper cannot seriously be taken as the last word on the subject.

(Presuming that we're talking about the same Tyler Harper, and not the Georgia state Ag Commissioner)
That's fair - that particular quote just happened to be close at hand. The PF threads provide more technical descriptions of what an LLM does.

Nugatory · Jun 7, 2025

PeroK said:

Who says that "thinking" isn't making statistically informed guesses. Unless you believe that our intelligence is divine, our neurons must be working to some sophisticated algorithm. Whatever that algorithm is, it could be described as not "thinking".

It's getting to where we do have to confront that question. We could consider a lifetime of experience with sensory input, interactions with the world, other humans, written and spoken communication, physical manipulation of objects, all as the "training data" provided to the complex protoplasmic device inside our skull, and it is quite possible that a manmade device of comparable scale and malleability could develop comparable capabilities.

But that's taking us well beyond this thread: Today's LLMs aren't doing what OP imagines.

PeroK · Jun 7, 2025

fresh_42 said:

That's a risky game. I recently came across a "paper" about a "new proof" of Fermat's last theorem. The author definitely tried to mimic the look of a serious paper, but a closer look revealed basic formal flaws. Look-Alike is only half of the truth. It's more a bit like: I know it when I see it.

That misses the point. That paper shows what one looks like. I never said that anything like that must be valid. The Penrose paper could be nonsense too!

Today I have been ambushed by more false syllogisms than ever!

fresh_42 · Jun 7, 2025

PeroK said:

That misses the point. That paper shows what one looks like. I never said that anything like that must be valid. The Penrose paper could be nonsense too!

Today I have been ambushed by more false syllogisms than ever!

I didn't mean it so seriously. Definitely not in the state of an argument. I just wanted to note that form alone isn't sufficient. Penrose's point of view is a completely different issue and worth a thread on its own in a technical forum.

PeterDonis · Jun 7, 2025

Moderator's note: Several posts have been moved to the other current AI thread:

Thread 'Is AI Overhyped?'

Jun 6, 2025

In my discussions elsewhere, I've noticed a lot of disagreement regarding AI. A question that comes up is, "Is AI hype?" Unfortunately, when this question is asked, the one asking, as far as I can tell, may mean one of three things which can lead to lots of confusion. I'll list them out now for clarity.

1. Can AI do everything a human can do and how close are we to that?
2. Are corporations and governments using the promise of AI to gain more power for themselves?
3. Are AI and transhumans an existential threat?

Any thoughts on these questions?

frankinstien · Jun 7, 2025

Nugatory said:

An LLM neither assesses nor understands. Quoting from Tyler Harper in the Atlantic (behind a paywall so no link):

[attributing thinking/understanding to an LLM] betray a conceptual error: Large language models do not, cannot, and will not “understand” anything at all. ... LLMs are impressive probability gadgets that have been fed nearly the entire internet, and produce writing not by thinking but by making statistically informed guesses about which lexical item is likely to follow another.
Some of our older threads about why we currently do not accept LLMs as valid sources discuss the technology and its limitations in greater depth.

It's very difficult for us to avoid this conceptual error. We naturally consider speech to be an expression of the speaker's thoughts so imagine a thinking/reasoning agent behind every piece of coherent speech. But that's just not what an LLM is.

So, here's where the issue of emergence comes in, despite the simple rule that an LLM performs at its basic objective, which is to predict the next word in a sequence, as well as its simple logistic function, what emerges to perform that prediction is the beauty behind LLMs. I think you confuse thinking, which I doubt you can actually define, with cognitive abilities. LLMs do demonstrate cognitive abilities. What most want to believe they are is the classical "Homunculus" agent of reason and free will, not a collection of cognitive processes that reintegrate their inputs and outputs, where some of those outputs post to the hippocampus. What I do find interesting is how zero-shot learning in LLMs happens, where at one time it was a big issue that neural networks needed streams of data to learn, whereas humans can learn, if starting from a sufficient level of experience, learn fairly quickly, comparatively from a very small data set. With zero-shot learning, the ability for an LLM to learn from an interactive conversation is possible, however, that lesson needs to be stored in a database and a technique called Retrieval Augmented Generation (RAG) is used that gives the LLM contextual long term memory of past conversations. Even Tesla's Optimus learned to dance from a zero-shot RAG approach, so its not limited to language but actual physical experience.

So, does an LLM think? As I stated before, it's not a very good conjecture; the better question is, does an LLM demonstrate cognitive abilities? Ultimately, we'll have to apply the same metric to humans, which we do, and that is what cognitive abilities do we excel at as individuals, what we call an individual's "Talent"...

PeroK · Jun 8, 2025

If we get back to the orignal question.

1) It's against the rules to post and discuss your paper on here.

2) No peer-reviewed physics journal will consider your paper for publication.

3) No professional physicist is likely to help you develop the paper.

Your only option is to develop your paper with the help of Gemma and self publish. Even then, no one in the professional physics community is going to read it.

That's the reality of the situation.

PeterDonis · Jun 8, 2025

frankinstien said:

AI assesses the idea without bias.

No, AI is not assessing your idea. That is simply not what LLMs do. LLMs don't even have any concept of "assessing an idea". All they do is generate more text with similar patterns to the text in the prompt you give them, where "similar patterns" is based on their training data, i.e., a corpus of text scraped from the Internet.

You are simply reading things into the LLM text that aren't there.

PeterDonis · Jun 8, 2025

Nugatory said:

We could consider a lifetime of experience with sensory input, interactions with the world, other humans, written and spoken communication, physical manipulation of objects, all as the "training data" provided to the complex protoplasmic device inside our skull, and it is quite possible that a manmade device of comparable scale and malleability could develop comparable capabilities.

Yes, but as your description of the "training data" we humans get illustrates, that data is much, much, much, much more than just text scraped from the Internet. What's more, it's qualitatively different, since it includes "interactions with the world", "physical manipulation of objects", etc. Our brains are physical devices, yes, but they're not physical devices that do nothing but process text.

javisot · Jun 8, 2025

PeroK said:

This is what a physics research paper looks like

https://arxiv.org/abs/2503.24263

Note the difference between this and what you have written.

In line with what you're saying, I don't know if you've ever tried (for example, with chatgpt) offering them an arxiv paper and asking them to rewrite it, taking a change into account.

What you expect is for chatgpt to maintain the quality of the original work and simply add the change, but the reality is that it returns a ridiculously small paper. I've found it's more effective to ask them to rewrite each part independently, not the entire paper at once.

Hornbein · Jun 8, 2025

The big problem with ChatGPT is that it's sycophantic. It will strive to tell you what you want to hear. This renders it unsuitable for evaluating papers, and possibly for evaluating anything else. Open Source is quite concerned about this. If I were them I'd leave it that way. I bet sycophancy adds considerably to its popularity.

I use ChatGPT solely for programming. I then can test what it produces immediately. It either works or it doesn't. Then the nonsense it occasionally produces is harmless.

frankinstien · Jun 8, 2025

PeterDonis said:

No, AI is not assessing your idea. That is simply not what LLMs do. LLMs don't even have any concept of "assessing an idea". All they do is generate more text with similar patterns to the text in the prompt you give them, where "similar patterns" is based on their training data, i.e., a corpus of text scraped from the Internet.

You are simply reading things into the LLM text that aren't there

Ah...No, what an LLM does at its fundamental process is indeed a simple rule, and from that simple rule emerge properties that allow it to do its basic objective, but what is surprising is what it can do. The interrelationships across an astronomical number of contextual relationships form a type of meaning. Effectively, if you assess yourself, what is the meaning of concepts over what an LLM is doing? You really can't say that someone who has zero experience as a plumber but reads about plumbing and is able to respond to questions about plumbing from reading about it is much different from an LLM! That if you have "experience" in the physical world, it gives meaning to a subject matter you've never experienced, doesn't give it much weight. You can say you've interacted with objects before, and that experience is associated with the symbolic language, which only means you've added some unrelated material that has some contextual relatedness as datapoints, which an LLM could do as well using word or even contextual embeddings. But what LLMs are demonstrating is the adaptive advantage of language, where data can be compressed into a data point called a context, and from those interrelationships, derive cognitive abilities that no other animal has...

javisot · Jun 8, 2025

frankinstien said:

You really can't say that someone who has zero experience as a plumber but reads about plumbing and is able to respond to questions about plumbing from reading about it is much different from an LLM!

This is usually shown with the example of chinese room https://en.wikipedia.org/wiki/Chinese_room

His counterargument is https://en.m.wikipedia.org/wiki/Strong_AI_hypothesis

PeterDonis · Jun 8, 2025

frankinstien said:

You really can't say that someone who has zero experience as a plumber but reads about plumbing and is able to respond to questions about plumbing from reading about it is much different from an LLM!

Thank you for agreeing with my main point!

You're quite right--and you wouldn't allow such a person to actually try to fix a plumbing problem in your house, would you? You'd want an actual plumber who could connect all those words about plumbing to actual plumbing in the real world.

And what you are trying to do in this thread is just as daft--asking an LLM, something which has zero experience actually doing science but has "read about" lots of "scientific stuff" by snarfing up text from the Internet--to evaluate a scientific paper for you.

Using AI to evaluate white papers?

Thread 'Is AI Overhyped?'

Similar threads

Predictions for the Nobel Prize in Physics 2025 (results: John Clarke, Michel H. Devoret and John M. Martinis)

Why do we spend so little time learning grammar in college?

What is the deepest/most impactful statement that you have ever seen?

Kitten raising advice

How do people explore new ideas in physics?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers