On Progress Toward AGI

  • Thread starter Thread starter gleem
  • Start date Start date
AI Thread Summary
The development of artificial general intelligence (AGI) aims to achieve AI that matches human intelligence, characterized by learning, reasoning, and adaptability. Current AI systems excel in specific tasks but lack the ability to generalize skills or perform complex reasoning. The Abstract and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) serves as a benchmark to compare human and AI performance, focusing on inductive reasoning. The test has evolved to increase difficulty as AI capabilities improve, with significant cash prizes for competition participants.A key challenge in AI development is memory management, as models struggle to integrate past actions into current tasks effectively. While some companies, like British Petroleum, recognize AI's potential, they hesitate to deploy it due to concerns about understanding AI errors. This reflects a broader societal skepticism towards AI compared to human judgment, despite AI often making fewer mistakes.Concerns about the rapid growth of AI data centers and their impact on energy infrastructure are also highlighted, particularly in Texas, where demand is expected to surge.
gleem
Science Advisor
Education Advisor
Messages
2,700
Reaction score
2,171
TL;DR Summary
New tests to compare AI performance to human performance for tasks that are natural and easy for humans but are currently difficult for AI. These tests are known as Abstract and Reasoning Corpus for Artificial General Intelligence., ARC-AGI. Current AI models are just beginning to compete with humans for simple tasks. An example is given.
One of the goals of AI development is to achieve AI at the level of human intelligence, AGI artificial general intelligence. In the thread "Is AI Hype" I glibly stated that we will know we have AGI when we see it. This is of no value in trying to develop AGI. But what is intelligence? The characteristics of intelligence include performing tasks by learning, applying, adapting, and reasoning something that standard LLMs are not expected to do and don't

Current AI systems can perform certain complex human tasks above human levels by acquiring information about these tasks. Basically, the AI is given a skill(s) for certain a task(s). However, it cannot leverage its information to develop new skills at least not extensively. Most cannot reason that is to go through a series of queries checking for leads resulting in the completion of a task. I try to avoid using unique anthropomorphic terms like thinking to avoid making it more than it is.

What can be done though is to develop tests that both humans and AI can perform and compare the results. In particular, such a test would be easy for humans but difficult for AI. Such a test has been developed called Abstract and Reasoning Corpus for Artificial General Intelligence ARC-AGI introduced in 2019. The test is based on a standard IQ test involving inductive reasoning ability using sample patterns to find a rule for how the patterns are related and applying this knowledge to predict the pattern to a new situation

A simple example: Given these
1751485121884.webp


What would you predict for this
1751485166500.webp


A discussion of this test can be found here.

A quick explanation is given below by its inventor Francois Chollet in the video below.



When the development of AI began to show some human levels of performance in the first edition of ARC when models went from 20% of human capability to 86% in a few months ARC_AGI was revised to be significantly more difficult. It was available at the end of March of this year.
In anticipation of more powerful models expected to be released, ARC-AGI is being revised again and will be available next year.

This test/benchmark is used on Kaggle a platform for data science competitions where AI developers compete for cash prizes for developing the best model for a specified problem. The prizes for ARC-AGI competition totals $1M.

Finally, the test is not just about showing AI better than humans at cognitive tasks but doing so efficiently i.e., the lowering of the compute, i.e., computer resources, to an acceptable level for a task.
 
Computer science news on Phys.org
gleem said:
I glibly stated that we will know we have AGI when we see it.
I think that this is the best way to look at it for now. No matter what tests are created, people just keep moving the goalposts anyway.

The main problem for models to perform planning is memory. I'm not referring to training data or context windows in this respect. When a model is asked to perform a task, it needs to have long term capabilities to know what it has done in the past, integrate it into its current actions and update its plan and memories for the next step. Figuring out which memories are relevant and when to use them in the current context window is no easy task (how do you generalize your thought processes for every situation?). As those architectures get better, I think that we'll see something closer to AGI.
 
I always defined intelligence as "having imagination". The smarter you are, the more imagination you have, and vice versa.
 
Benj at Arstechnica made a run-through on how different AI groups and players currently try to define what AGI is (or not is):
https://arstechnica.com/ai/2025/07/...fine-and-thats-a-multibillion-dollar-problem/

I must admit it had skipped my attention that Microsoft and OpenAI apparently quite literally define AGI as being achieved when the technology generates 100 billion in profits (per year, I guess?), which is a definition that seems conceptually fairly uncorrelated with more technical definitions that tries to compare with intellectual performance of humans, but on the other hand that definition very clearly underscore (if anyone had any doubt) that for the big players it is pretty much only the huge amount of potential profits that drives the tech and the hype around it.
 
must admit it had skipped my attention that Microsoft and OpenAI apparently quite literally define AGI as being achieved when the technology generates 100 billion in profits (per year, I guess?),
Does anyone else find this funny?
 
  • Like
Likes ShadowKraz and PeroK
This is about an article in Fortune.

I think it is generally accepted that we want AI to be as capable at any task as any human. Humans are not perfect and make mistakes: "to err is human...". Some companies will deploy AI when this is demonstrated. Some, however, have reservations, take for example, British Petroleum Corp. They are thinking of using AI to advise on safety and reliability. They let a LLM take their safety certification example, and it scored 92% well above the average. So are they using it? No, because they could not figure out why it got 8% wrong. My question is, do they ask humans why they got their answers wrong? OK, the humans will learn from their mistakes, but can't LLM be trained or corrected?

The article brings up a good point that the mistakes of AI are seemingly unhuman-like. So when a mistake is made, we think that if a human were making the decision, the mistake would not have happened. We fail to recognize on average, it makes fewer mistakes. We accept human fallibility over AI because we believe that human activities can be made perfectly safe by rules, regulations, or training (falling into the category of insanity).

Machines should always be predictable. It seems whenever AI passes a test for "intelligence" humans say it was not good enough and develop another one.

The attitude of not trusting AI over human judgment in many cases is like the problem that some have with vaccines. While some vaccines harm a few people, they save way more lives. Some are willing to accept a much larger risk than accepting a much smaller one that they may feel responsible.
 
I think society should require large AI data centers to generate their own electricity at their own expense.
 
bob012345 said:
I think society should require large AI data centers to generate their own electricity at their own expense.
I think Colossus the supercomputer that Musk just built in Memphis TN is powered on site by 35 gas turbine generators. He is also purchasing a 2GW gas powerplant from Europe and shipping it to Memphis to provide power for 1M GPUs he ultimately plans for Colossus.
 
  • Like
Likes bob012345 and PeroK
bob012345 said:
I think society should require large AI data centers to generate their own electricity at their own expense.
What does that mean? These are computing services companies; of course they already buy (pay for the generation of) their electricity as part of their normal expenses....like any other business or household does.

As said they sometimes build/buy their own plants, but whether they do or don't there's not much real world difference.
 
  • #10
russ_watters said:
What does that mean? These are computing services companies; of course they already buy (pay for the generation of) their electricity as part of their normal expenses....like any other business or household does.

As said they sometimes build/buy their own plants, but whether they do or don't there's not much real world difference.
There is concern the rate of growth of data centers for AI will outstrip the infrastructure putting a strain on the grid possibly leading to less reliability. In Texas ERCOT predicts a doubling of the grid capacity by 2031 with 50% of the projected additional power needs for data centers largely driven by AI. Texas is also becoming a crypto center.
 
  • #11
Speaking of progress, Grok 4 announced yesterday. Elon expects it to invent new technology and discover new physics within a year or so.

 
  • Informative
  • Sad
  • Haha
Likes ShadowKraz, PeroK and gleem
  • #12
bob012345 said:
Speaking of progress, Grok 4 announced yesterday. Elon expects it to invent new technology and discover new physics within a year or so.
While the video describing Grok4 is impressive, how much of what Musk says about his products can you take to the bank? Tesla's full self-driving mode is still in development, his robot Optimus is behind schedule, Starship has hit a wall. His Boring company is well boring. His solar company is not quite stellar. I guess we will have to wait and see how well Grok4 does, at least until Musk can keep it from hallucinating. BTW, Musk cofounded OpenAI and had access to much of the development of GPT. so it is not l ike he built Grock from scratch.
 
  • Like
Likes russ_watters and ShadowKraz
  • #13
bob012345 said:
Speaking of progress, Grok 4 announced yesterday. Elon expects it to invent new technology and discover new physics within a year or so.

Uh huh.
https://techcrunch.com/2025/07/10/g...-elon-musk-to-answer-controversial-questions/

During xAI’s launch of Grok 4 on Wednesday night, Elon Musk said — while livestreaming the event on his social media platform, X — that his AI company’s ultimate goal was to develop a “maximally truth-seeking AI.” But where exactly does Grok 4 seek out the truth when trying to answer controversial questions?

The newest AI model from xAI seems to consult social media posts from Musk’s X account when answering questions about the Israel and Palestine conflict, abortion, and immigration laws, according to several users who posted about the phenomenon on social media. Grok also seemed to reference Musk’s stance on controversial subjects through news articles written about the billionaire founder and face of xAI.
 
Last edited:
  • #15
gleem said:
While the video describing Grok4 is impressive, how much of what Musk says about his products can you take to the bank? Tesla's full self-driving mode is still in development, his robot Optimus is behind schedule, Starship has hit a wall. His Boring company is well boring. His solar company is not quite stellar. I guess we will have to wait and see how well Grok4 does, at least until Musk can keep it from hallucinating. BTW, Musk cofounded OpenAI and had access to much of the development of GPT. so it is not l ike he built Grock from scratch.
Well, at least Musk is doing things. It seems to me most people (including myself) just watch while they critique and criticize.
 
  • #16
bob012345 said:
Well, at least Musk is doing things. It seems to me most people (including myself) just watch while they critique and criticize.
Some of the things he's doing are better left undone.
 
  • Like
  • Wow
Likes ShadowKraz, Borg, bob012345 and 1 other person
  • #17
While Grok4 significantly beat all other GPT models on the ARC-AGI2 benchmark (See post #1) it scored only 16.2% about double Anthropic Clause Opus 4. Musk is relying on the supposition that the more GPUs you have the more powerful GPT will be. Some believe that GPT is beginning to hit a wall as far as AGI is concerned and that the only significant progress will be in the development of neuromorphic hardware. Also this may be the only way to make AGI cost effective.
 
  • Like
  • Skeptical
  • Love
Likes russ_watters, ShadowKraz, bob012345 and 1 other person
  • #18
Hornbein said:
Does anyone else find this funny?
And disappointingly predictable. At the risk of getting this comment pulled or a warning issued, this is the problem with the relationship between corporations and research. Bean counters attempt to evaluate the results but only use the metric of "how much money can we make off of it hic et nunc".
 
  • Like
Likes russ_watters
  • #19
bob012345 said:
Well, at least Musk is doing things. It seems to me most people (including myself) just watch while they critique and criticize.
Yes, yes he is... but it's his motivations that bother many of us. Why is he doing these things? Based upon his actions and words, it is decidedly not for the betterment of our species. That is the true issue here. It's the same issue with any corporation working on AI. It isn't for the betterment of our species but merely to find new ways to line their already bulging pockets.
AI, in and of itself, is not an issue; it is the people behind the AI, the ones calling the shots on its development and uses who are the issue.
 
  • #20
Good article on the LLMs possibly hitting a wall

https://www.newyorker.com/culture/open-questions/what-if-ai-doesnt-get-much-better-than-this

It’s hard to overstate how completely the A.I. community came to believe that it would inevitably scale its way to A.G.I. In 2022, Gary Marcus, an A.I. entrepreneur and an emeritus professor of psychology and neural science at N.Y.U., pushed back on Kaplan’s paper, noting that “the so-called scaling laws aren’t universal laws like gravity but rather mere observations that might not hold forever.” The negative response was fierce and swift. “No other essay I have ever written has been ridiculed by as many people, or as many famous people, from Sam Altman and Greg Brockton to Yann LeCun and Elon Musk,” Marcus later reflected. He recently told me that his remarks essentially “excommunicated” him from the world of machine learning. Soon, ChatGPT would reach a hundred million users faster than any digital service in history; in March, 2023, OpenAI’s next release, GPT-4, vaulted so far up the scaling curve that it inspired a Microsoft research paper titled “Sparks of Artificial General Intelligence.” Over the following year, venture-capital spending on A.I. jumped by eighty per cent.



After that, however, progress seemed to slow. OpenAI did not unveil a new blockbuster model for more than two years, instead focussing on specialized releases that became hard for the general public to follow. Some voices within the industry began to wonder if the A.I. scaling law was starting to falter. “The 2010s were the age of scaling, now we’re back in the age of wonder and discovery once again,” Ilya Sutskever, one of the company’s founders, told Reuters in November. “Everyone is looking for the next thing.” A contemporaneous TechCrunch article summarized the general mood: “Everyone now seems to be admitting you can’t just use more compute and more data while pretraining large language models and expect them to turn into some sort of all-knowing digital god.” But such observations were largely drowned out by the headline-generating rhetoric of other A.I. leaders
 
  • Informative
  • Like
Likes russ_watters, nsaspook and jack action

Similar threads

Replies
17
Views
2K
Replies
26
Views
2K
Replies
21
Views
3K
Replies
5
Views
2K
Back
Top