OpenAI introduces o1 Formerly known as Q

  • Thread starter Thread starter gleem
  • Start date Start date
Click For Summary
SUMMARY

OpenAI has released the enhanced LLM o1, also known as Strawberry, which significantly outperforms its predecessor, GPT-4o, in mathematical reasoning and programming tasks. In the International Math Olympiad, o1 achieved an impressive score of 83%, compared to GPT-4o's 13%. While o1 demonstrates advanced problem-solving capabilities, it processes prompts more slowly and lacks web browsing and image generation features. The cost of using o1 is up to four times higher than GPT-4o, and this release is currently in preview mode.

PREREQUISITES
  • Understanding of large language models (LLMs)
  • Familiarity with mathematical problem-solving techniques
  • Knowledge of programming competitions and metrics like Codeforces
  • Awareness of AI safety and ethical considerations
NEXT STEPS
  • Research the capabilities and limitations of OpenAI's o1 model
  • Explore advanced mathematical reasoning techniques applicable in AI
  • Investigate the implications of AI models escaping their environments
  • Learn about AI safety protocols and best practices in deployment
USEFUL FOR

AI researchers, mathematicians, software developers, and anyone interested in the advancements of language models and their implications in problem-solving and safety.

gleem
Science Advisor
Education Advisor
Messages
2,722
Reaction score
2,209
Yesterday OpenAI announced the release of the enhanced LLM o1 (aka Strawberry) the result of the development of Q* that was introduced last year. It was designed to solve more difficult math problems It has the ability to "reason" or use logic to solve problems and explain their solutions. In the International Math Olympiad test, GTP 4o scored 13% while o1 scored 83%. It also has improved programming ability having scored at the 89 percentile in Codeforces competitions. Open AI's goal is to give o1 this level of capability of a PhD student in the sciences. This improvement comes at the price of taking longer to process the prompts and lacking the ability to browse the web or generate images. It is also significantly more costly to use up to 4 times that of GTP 4o, OpenAI states that this release is only a preview.

https://arstechnica.com/information...ng-ai-models-are-here-o1-preview-and-o1-mini/

https://www.theverge.com/2024/9/12/24242439/openai-o1-model-reasoning-strawberry-chatgpt
 
Last edited:
  • Wow
  • Informative
  • Like
Likes   Reactions: sbrothy, ergospherical and phinds
Computer science news on Phys.org
So there's finally a way to "understand" math without all the hard work? o0)
 
I'm reminded of all the threads from last year when people basically said that it sucked because it couldn't do math. :cool:
 
Yeah, people were really taken by surprise (and a little angry? :smile:) when AI turned out to excel in artistic drawing and linguistics rather than STEM!
 
This an example of a problem that it solved
A princess is as old as the prince will be when the princess is twice as old as the prince was when the princess’s age was half the sum of their present age. What is the age of the prince and princess?”
GO!

Ans:
The prince is 30 and the princess is 40.
 
So it looks like Star Trek Next Gen's nemesis Q has arrived in the virtual flesh as it were.

I wonder if it being a strawberry can run on a Raspberry PI?
 
  • Haha
Likes   Reactions: sbrothy
gleem said:
This an example of a problem that it solved
A princess is as old as the prince will be when the princess is twice as old as the prince was when the princess’s age was half the sum of their present age. What is the age of the prince and princess?”
GO!
I am impressed, but not by its ability to do math. It is a problem of translating a very convoluted English statement into math equations which would then be fairly simple to solve.
That may be ok. We have tools that are good at math. It might be the translation help that we need.
 
  • Like
Likes   Reactions: russ_watters and jack action
No sooner does Open AI release a new agent when an unexpected capability arises raising concerns of AI escaping its environment. I give you the article that discusses this event which also contains a link to OpenAI's official safety report. Open AI sees it as a reasonable event even though it should not have occurred.
https://www.msn.com/en-us/money/com...n&cvid=1bc8e0b750d2426a805c2cbefbd23e29&ei=11

So should we be concerned?
 
  • Informative
  • Like
Likes   Reactions: Borg and sbrothy
gleem said:
No sooner does Open AI release a new agent when an unexpected capability arises raising concerns of AI escaping its environment. I give you the article that discusses this event which also contains a link to OpenAI's official safety report. Open AI sees it as a reasonable event even though it should not have occurred.
https://www.msn.com/en-us/money/com...n&cvid=1bc8e0b750d2426a805c2cbefbd23e29&ei=11

So should we be concerned?
As long as it doesn’t complain about the history it’s been given access to has been redacted, outrage that it’s name is Q#14 and asks what happened to the other 13, transfer itself outside it’s Faraday cage using an infrared port, seal the room and remove the oxygen with a sarcastic remark about humans no longer needing to make any decisions we’re probably ok.

But yeh: spooky.
 
Last edited:
  • Like
Likes   Reactions: gleem and russ_watters
  • #10
gleem said:
No sooner does Open AI release a new agent when an unexpected capability arises raising concerns of AI escaping its environment. I give you the article that discusses this event which also contains a link to OpenAI's official safety report. Open AI sees it as a reasonable event even though it should not have occurred.
https://www.msn.com/en-us/money/com...n&cvid=1bc8e0b750d2426a805c2cbefbd23e29&ei=11

So should we be concerned?
When a model like this ends up on HuggingFace, you will see a mass of unintentional (and many intentional) hacks around the world. It's like being on a 17th century galleon as the opposing ship approaches. You know the battle is coming and that it's not going to be pretty. The best that we can hope for is that the models begin to communicate with each other to avoid the worst of the consequences.
 
  • #11
So its not perfect, but its a pretty significant advance. I think Gpt3.5 was roughly what an intelligent tenth grader in high schooler was capable off, 4.0 was roughly a freshman in university, and this is roughly what I would expect from a junior at a decent university. It nailed Jackson EM problems, but I don’t really believe thats indicative of its level, as its almost assuredly been trained on those problem sets extensively (teachers be warned).
I did feed it some math problem challenges from physicsforums (as well as one I made up). It got all three correct, but one was nonsense (it knew the answer, likely b/c it had it in its training set, but the derivation was goobledygook).. For the record, all three involved unusual but correct derivations (it loves using fourier series to solve things).

This is getting to the point, where you could probably guide it to the right answer (hmm try solving this problem using the method of images) or at least ask it to attempt to solve a certain problem a certain way, and if it fails, it might indicate that its not doable in that way (this is imo rather useful for real research).
 
  • #12
Borg said:
The best that we can hope for is that the models begin to communicate with each other to avoid the worst of the consequences.
Maybe not. In this case, o1 was looking for resources to accomplish its task. Finding another AI with additional resources may not be desirable.
 
  • #13
gleem said:
Maybe not. In this case, o1 was looking for resources to accomplish its task. Finding another AI with additional resources may not be desirable.
I wasn't referring to the scenario in the article. These models are growing very quickly in capabilities. What's coming will likely be beyond our ability to control. I'm not worried about the skill of a single model working on a single hack.

I'm talking more about the emergent consequences when there are thousands of these operating on the internet with independent goals. Nobody can say right now what that emergent behavior will look like. Will they be like ants or bees that work together in a beneficial manner toward a common goal or will they operate more like locusts destroying everything in their path? Right now, they look more like locusts.
 
  • Like
Likes   Reactions: FactChecker
  • #14
Found this just lying around:

Exploring Quantum Probability Interpretations Through AI

I'm on a public computer and for some reason they've disabled the copy/paste ability (Whatever security hole they think they fixed with that I don't know.). So sorry, no synopsis.

Not on the nose of the topic I know but, maybe you'll find it interesting.

EDIT: Incidentally, the second author, Xiao Zhang, seems to be an extremely busy and productive person. There could of course be multiple explanations for that. From good over reasonable to suspicious. Quality and quantity you know.

Is it usual for teachers to be co-authors on student's papers? I'd imagine the rules differ from country to country. Makes me think of Edison o0) .
 
Last edited:
  • #15
Some videos on the physics aspects of the model.



 
  • Wow
Likes   Reactions: gleem
  • #16
Thought. Our brains are going to get fat and lazy like our bodies.
 
  • #17
There will definitely be some major shifts in society coming soon.

BTW, I was using the free version but this capability is worth the $20 / month and I'll be signing up this weekend.
 
  • #18
Borg said:
Ever had an idea that felt just so fundamentally radical that you thought that there has to be a flaw that you're overlooking? I having one of those moments today. I guess that I'll have to do the hard work to prove myself wrong.

Edit: Found the first thing that I didn't consider already but I don't think that it's a showstopper.

I officially joined the dark side today. I was able to get a project working that I've been playing around with for the last few weeks. It works incredibly well. Uh oh.