Reliability of LLM for generating examples in mathematics

  • Thread starter Thread starter elias001
  • Start date Start date
AI Thread Summary
The discussion centers on the reliability of large language models (LLMs) in generating mathematical examples, particularly in higher mathematics topics like module theory and homological algebra. Users express skepticism about the accuracy of LLMs, questioning whether they can provide correct examples to illustrate complex theorems, especially as mathematical concepts become more abstract. While LLMs can assist in generating examples, their limitations in reasoning and understanding nuanced mathematical contexts are highlighted. The conversation emphasizes the need for human oversight when using LLMs for educational purposes, as well as the potential for these tools to enhance learning if used correctly. Ultimately, the dialogue calls for a deeper exploration of how to leverage LLMs effectively in mathematics education.
elias001
Messages
389
Reaction score
26
I have s question for anyone who has any technical side of the various LLMs.
From its current limitations and capabilities, how good can it be used to come up with examples in a particular subdiscipline of higher mathematics.

Let me explain, say one picks any theiren randomly out of an typical undergraduate textbook. You ask the LLM to come up with examples to illustrate the theorem.

Here, say we ask ChatGPT to come up with examples to illustrate the various theorem in the topic of exact sequences in the theory of Module theory, three by three, four by four, snake lemma, then we can move onto examples for the two derived functors Tor and Ext, then to the homological algebra chapter in Dummit and Foote.

I know many of you have reservations about reliability of AIs, either there are benefits to science and math education this technology has to offer, or all the ph.ds in AI and machine learning has been wasting their time all this time. I am not expecting any of you to give a 100% correction all of the time every time you ask a LLM, but if you think just for the entirety of the undergraduate and beginning graduate level courses and textbook include, the chances of any of the LLM being accurate is less than 50%, then when do you think it will be at least at 80%+ accuracy.

Also why am I asking this question? Because many of you who have study pure math know that after certain point in a subject, the examples gets fewer and fewer, even amongst topics that has made it to the undergraduate level.
 
Technology news on Phys.org
Yes, the LLMs have trouble with reasoning. They can do simple stuff, but as things get tougher, they need a human to guide them to the answer by working through the problem step by step.

It would not be incorrect to say that we are living in the technological era of very large models — they seem magical — they can generate code, text, images, videos - much better than any previous technology at scale. They have rapidly ascended from intriguing novelties to indispensable tools in modern enterprises. Their capacity to interpret and generate human-like text has unlocked new frontiers in customer service automation, content creation, and decision support.

Yet, despite their remarkable progress, LLMs exhibit fundamental gaps that senior technology leaders must acknowledge as they steer teams and resources in an increasingly AI-driven landscape.

 
@jedishrfu I am asking LLM to do very basic things. i am not even asking to proof already known theorems. Just to find correct examples or being able to generate them to match to theorems being inquired about.
 
Ask your favorite LLMs, the following...
Code:
subtract 5.9-5.11
 
elias001 said:
@jedishrfu I am asking LLM to do very basic things. i am not even asking to proof already known theorems. Just to find correct examples or being able to generate them to match to theorems being inquired about.
Can you provide the prompts you wrote?

Sometimes, the prompt needs more detail to get it right.
 
@jedishrfu the reason I am asking this question is in abstract algebra, one doesn't have to get very far after dummit and foote's textbook to know be in the territory where you are being presented with theorem proofs, exposition, theorem proofs, theorem proofs,...., then exercises. Look no further than the topic of Modules and talking about their primary decompositions, notion of quotient ideals. localisation in the context of algebraic curves in two variables.

So there was this theorem within a passage l presented to three different LLMs, it would ask what does certain phrase mean by asking "explain 'phrase X' using mathematical notations'. For certain notations Y i don't understanding or I want more details on, I will ask for notations Y, I would ask expand notation Y using set builder notations, and illustrate notation Y with a numerical example. I would also include a bunch of definitions as Def 1, Def, 2....etc, from a textbook before hand and I would ask show how notation Y satisfies Def 1, or Def 2. I would ask the LLM very simple things like this to make sure I am still on solid ground in the sense regardless how far I am into a topic, I have examples to keep in mind so that I can picture how the various part of a theorem or a definition would work.

Also it depends which LLM you asked. For the latest of the big tech ones, a few timea, i made mistake in my questions, the LLM would tell me that there is a mistake or a misunderstanding on my part because the notations I was writing things out is talking about the notion of ideals, but my question was asking it for an answer is not an ideal.
 
Why not just search for examples on the web? You get lots of lecture notes and similar questions on various forums. Better yet, try to come up with some examples yourself. Why use something that is shown to be unreliable!
 
@robphy See the attached screenshots. I asked Grok, Gemini, and MS Copilot:

Screenshot_20250822_062544_Chrome.webp
Screenshot_20250822_062532_Chrome.webp
Screenshot_20250822_062503_Google.webp
Screenshot_20250822_062436_Google.webp
Screenshot_20250822_062303_Chrome.webp
 
Last edited by a moderator:
Screenshot_20250822_063146_FBReader Premium.webp
Screenshot_20250822_063150_FBReader Premium.webp

@martinbn in the attached screenshot is lemma 2.5 from Pierre Antoine grillet's abstract algebra's text in the chapter about Ext and Tor.

I don't know if you heard about the common folklore about Serre and Grothendieck. The latter always can think without examples, while the former were always ready with providing examples. Steven Krantz in a book titled 'A mathematician comes of age', about mathematica said two things that is relevant here:
1. "Intuition is certainly not a panacea. Nobody ever solved a deep problem using just intuition alone. But nobody ever made any intellectual progress without exercising at least some intuition. Once intuition gets you to the
right general spot, then you must shift gears and apply deep analytical powers to make any further progress. But you would not be able to find that spot without some intuition."

2. "Learn to handle abstract ideas. Learn to learn abstract ideas without building up from concrete instances each time."

Well, how does one get to point 2 from point 1, by knowing one can always find their way back to planet Earth through looking at examples to make the abstract concrete.

Is not one of the seven kind of deadly sins amongst the Mathematical God to ask a LLM, even when there is a chance that it might make mistakes, to help with example especially when there is a complicated looking lemma like lemma 2.5.

We ask the LLM for examples, then we can check for ourselves. Also LLM can scour the entire internet much faster then humans. I am saying how do we leverage what LLM is good at to make learning pure mathematics more accessible to average students, students who are more like a Serre than a Grothendieck. By the way, I highly doubt Grothendieck doesn't think in examples all of the time. He probably did, but he like to presented his ideas without any examples.
 
Last edited:
  • #11
Don't forget the saying. "To err is human. To completely foul things up, you need a computer."
 
  • #12
elias001 said:
View attachment 364719View attachment 364718
@martinbn in the attached screenshot is lemma 2.5 from Pierre Antoine grillet's abstract algebra's text in the chapter about Ext and Tor.

I don't know if you heard about the common folklore about Serre and Grothendieck. The latter always can think without examples, while the former were always ready with providing examples. Steven Krantz in a book titled 'A mathematician comes of age', about mathematica said two things that is relevant here:
1. "Intuition is certainly not a panacea. Nobody ever solved a deep problem using just intuition alone. But nobody ever made any intellectual progress without exercising at least some intuition. Once intuition gets you to the
right general spot, then you must shift gears and apply deep analytical powers to make any further progress. But you would not be able to find that spot without some intuition."

2. "Learn to handle abstract ideas. Learn to learn abstract ideas without building up from concrete instances each time."

Well, how does one get to point 2 from point 1, by knowing one can always find their way back to planet Earth through looking at examples to make the abstract concrete.

Is not one of the seven kind of deadly sins amongst the Mathematical God to ask a LLM, even when there is a chance that it might make mistakes, to help with example especially when there is a complicated looking lemma like lemma 2.5.

We ask the LLM for examples, then we can check for ourselves. Also LLM can scour the entire internet much faster then humans. I am saying how do we leverage what LLM is good at to make learning pure mathematics more accessible to average students, students who are more like a Serre than a Grothendieck. By the way, I highly doubt Grothendieck doesn't think in examples all of the time. He probably did, but he like to presented his ideas without any examples.
Your are focusing on the wrong thing. You don't build intuition, abstract thinking and so on, and then go and study mathematics. You study mathematics and by that you aquire those skills.
 
  • #13
@martinbn but also remember that computers help put humans on the moon, and we humans have not even managed to solve the Gauss's circle problem. Computer foul things up because of human instructions and the way those instructions were presented with all the limitations associated with those instructions.
 
  • #14
@martinby, yes we study mathematics to acquire those skills. But to get good at point 2 of what Krantz claims, there is a process, and that process id always to be able to find solid ground by looking at examples, or translating definitions into more rigorous notations or trying to find examples to illustrate a mathematical definitions or theorem. Different strokes for different folks.

We are talking about how to make the process of learning and understanding abstract mathematics by employing tools that you like it or not will be here to stay and you can't keep making the same retort and rejoinder about not being reliable forever. How do we do it to make it more reliable? That is the question that people need to be asking.

Students will use it in classfmrooms, or wheh they are studying on their own or with their friends. Whenever you go on google now, the AI is not far behind in answering your questions, especially math related ones. You don't have dirrctly go and have a Gemini session all the time every time now.
 
Last edited:
  • #15
martinbn said:
Don't forget the saying. "To err is human. To completely foul things up, you need a computer."
And there's a story of how one woman got a letter from the bank apologizing for a bank error caused by the computer. Her daughter, a programmer, went to the bank to confront a bank officer about why the computer was blamed for an act traceable back to a human.

The bank officer said sheepishly that our clients would lose faith in our bank if we said it was human error. But they would be more comfortable if the computer were blamed.
 
  • #16
@robphy Apologies, I read your question too fast. I did not see the 5 before the 11. @martinbn here are the screenshots, grok, gemini and copilot in reverse order. I think copilot is based on chatGPT if I am not mistaken.


Screenshot_20250822_111016_Chrome.webp
Screenshot_20250822_110911_Chrome.webp
Screenshot_20250822_110839_Google.webp
Screenshot_20250822_110829_Google.webp
Screenshot_20250822_110801_Chrome.webp
Screenshot_20250822_110743_Chrome.webp
 
  • #17
The first one says -0.21. Do you really want to learn homological algebra from them?
 
  • #18
@martinbn the other two got it right. I asked specifically ask about how reliable and when can we expect these LLM to achieve certain level of reliability in generating examples. Just because they can give examples, doesn't mean it relief the person asking the responsibility of checking if the example works. Both Grok and Gemini checks if what you are asking contain any errors or if it doesn't have sufficient info. How will chiisevto proceed depends on the specific LLM. It might offer a plausible interpretation and checks if it satisfies the conditions of your query. If it can't solve the problem, it will tell you by the way it answers. i am not expecting any of these LLM to provide counter examples to the Riemann Hypothesis. But is not too much that these LLM will havs a chancs of succeeding in gensrating examples to illustrste concepts that is well known or written about in undergraduate/beginning graduate textbooks.

These LLMs, one thing they are good at is scouring the internet for sources. You can ask references be provided, for every answers given. Is a tool like having wolfram alpha or maple.

All these LLMs, they are not suited to be asked for certain area of math. That is my hunch. A case in point is asking it to evaluate certain complicated integrals.

Is not a matter of learning homological algebra from these LLM. I am working with books, papers, and I am using AI as a tool to assist with anything i have trouble understanding from my reading.
You are a mathematican, if you were an analysts or an algebraist, but someone comes and ask you that was not in your area of what you are usually teaching or researching, you can be just as unreliable as the LLM in your answer. But that doesn't mean you can be totally unhelpful to the person asking you for help, depending on what it is that is being asked of you.

Also, you had to teach courses you had never took classes in, I know this because friends who are professors told me they had to do it. One is a group theorists and he gave a course in measure theory. Another is a set theorist but had to give a course in the theory of computation. They were not 100% reliable compare to someone who has been teaching and researching those subject areas. But they were able to still give the course.
 
  • #19
elias001 said:
@robphy Apologies, I read your question too fast. I did not see the 5 before the 11. @martinbn here are the screenshots, grok, gemini and copilot in reverse order. I think copilot is based on chatGPT if I am not mistaken.
I read a comment somewhere (in response to this newsitem about ChatGPT doing math)
that suggested the operation to "subtract 5.9-5.11".

The other day, I tried this on ChatGPT, CoPilot, Gemini, and Claude.
All gave "-0.21."

I think it was important to ask it precisely "subtract 5.9-5.11",
without further clarification or conditions.
(I would hope to not have to clarify every query to better guarantee a correct result.)

I haven't had a chance to look through their "reasoning" log, if available.

While I have been impressed with LLMs (especially compared to a few years ago),
I still have to carefully check their responses,
no matter how plausible they may appear to be.
 
  • #20
@robphy I find it is better to put conditions in your query. Depending on the question you asked, even if it gives you a correct answers, it might use methods that are too advanced or not appropriate to the level of the person making ths query.
 
  • #21
elias001 said:
@robphy I find it is better to put conditions in your query. Depending on the question you asked, even if it gives you a correct answers, it might use methods that are too advanced or not appropriate to the level of the person making ths query.
I agree... but I would hope that an LLM wouldn't need such conditions
on what are presumably very easy and very basic questions in arithmetic.
 
  • #22
@robphy I am asking LLM basically the equivalent of searching a database (the internet) for some pattern matching thingies (math examples) that fit certain criteria. LLM should be able to do that much. zi and not asking insights or opinions on any open problems or conjecture. LLM is very good at parsing text and has some basic reasoning abilities in higher mathematics. Is not perfect by any measure, it even has a disclaimer that it may contain mistakes. What is everyone being so distrustful of it. The LLM is not being ask advice on how to perform life saving brain surgery, nor to try to help with designing some important military hardware like the Radar Cross Section of the section of the B-2 bomber were need to be calculated and Kenneth Mitzner had to went into his local university library to consult works on compact operators. We are asking if there are examples that could serve as illustration for undergraduate mathematics theorems.

Isn't it better that LLMs are not actually perfect. That means there will always be the need of hiring graduate teaching assistants. Also, it is a tool that will help to alleviate the constant demand on their time, which is never enough which students deserve to get for their best possible educational outcome, and which a university would never have enough money to pay for it in an ideal situation.
 
  • #23
elias001 said:
We are asking if there are examples that could serve as illustration for undergraduate mathematics theorems.
And what was the result? Did you get any useful examples?
 
  • #24
@martinbn yea, in some of my past posts where you responded, I ask the LLM to use an examples to illustrate national use for different definitions, or i want to see the notation written out using set builder notations. I ask three or four different LLM with the same prompt queries and I make sure that the LLM list references. Gemini already give you that options if you choose anything beyond the basic options for response mode. There are reasons why these type of exercises should be easy for a LLM to do and hopefully it can be done with a high degree of accuracy.

They are little questions that can be quickly answered and also when trying to do proofs, it is important to be able to look at examples, either a lot of them or a few meaningful ones. Also, looking at how at definition in terms of formal mathematical notations avoid any conceptual misunderstanding convey only using english. Again an LLM hopefully can be accurate or get to a point where it has high enough accuracy woth translating back and forth between math English to symbolic logic notations.

Also again, we are not asking any LLM to translate definitions, giving examples of theorems from Andrew Wiles paper on his proof of Fermat.

You are also forgetting LLM in the future can get access to theorem proof checker coding language math libraries, and as for whether the AI hallucinate, the more heavy duty reasoning bit can be off load to Lean, Isabella or HoL in case if examples given to the person doing the asking require more than well known computational techniques. But that is for the future.
 
  • #25
@elias001 I was curious to see specific answers it gave you?
 
  • #26
@martinbn I will tagged you in two of my past posts where you responded and post the examples there. Some of them will be straight up asking google without engaging Gemini.
 
Back
Top