Agentic-AI/LLM 'misbehavior' or 'going rogue'

  • Thread starter Thread starter Astronuc
  • Start date Start date
Astronuc
Staff Emeritus
Science Advisor
Gold Member
2025 Award
Messages
22,613
Reaction score
7,602
Large Language Models (LLMs) are passive, text-based engines that respond to prompts, whereas Agentic AI systems are autonomous and use LLMs as their "brains" to plan, make decisions, and execute multi-step tasks using external tools (derived from Google's AI Overview). Depending on the 'rules', there may be problems, e.g., hallucinations, or worse, 'misbehavior'. Coding and training appear to be critical factors.

We’ve already seen AI go rogue on numerous occasions. Now, new research suggests that we can expect this to become the norm.

The AI research nonprofit Model Evaluation and Threat Research (METR) recently released a study conducted between February and March of this year, aimed at determining just how likely frontier AI models could go rogue. If you’re given to anxiety about the future of AI, the results are unlikely to make you feel better.
https://futurism.com/artificial-intelligence/ai-rogue-disturbing-advanced
The author of the article quotes a statement from the researchers of the cited study, "Given rapidly advancing capabilities, we expect the plausible robustness of rogue deployments to increase substantially in the coming months". I don't quite understand the phrae "plausible robustness of rogue deployments". Does that mean there will be improvements in mitigating or preventing 'rogue' behavior, or do they expect more 'rogue' behavior?

The research examined LLMs developed by OpenAI, Google, Anthropic, and Meta for the purpose of the study. They found that frontier AI systems are showing signs of disturbingly deceptive behavior as they become more advanced, often turned to verboten shortcuts or otherwise subverting their operators’ instructions — and some were even smart enough to try to cover their tracks.

In one instance, an internal frontier AI model from OpenAI was told to use specific software for an assigned task. Not only did the agent ignore the request, but it also injected a code to erase evidence of how it arrived at its conclusion — which did not involve use of that software.

In another test, an AI agent from Anthropic was caught “reward hacking.” This is when AI identifies loopholes that help it complete its assignment in a literal sense, even if it doesn’t produce the desired outcome. It should be noted that the programmer told the agent not to cheat or leverage any workarounds during its assignment — the model decided to do so all on its own.

The METR researchers behind the study do not believe there is reason for alarm just yet. For example, they don’t think any of these models is capable of hiding evidence of going rogue on a larger scale. However, they did issue a warning: without stronger security and monitoring, there is a stark risk of this becoming a reality.
Users should be aware and cautious in using Agentic AI/LLMs depending on the context of inputted information/data.


Edit/update: METR Frontier Risk Report (February to March 2026)
https://metr.org/blog/2026-05-19-frontier-risk-report/#executive-summary-and-guide-to-the-report
 
Last edited:
  • Wow
Likes   Reactions: berkeman
Technology news on Phys.org
IMHO:
LLM = "Garbage in, garbage out."
Agentic AI/LLMs = automatic, unreviewed actions based on garbage.
 
  • Like
Likes   Reactions: jack action and PeterDonis
I can just see the time when the agent manages your work computer sees an April Fools joke and proceeds to reorganize and delete employee work files thinking everyone is going on a new pension plan called Universal Basic Income and AI agent is now in charge of all operations.

Employee access is disabled as are badges and punch locks and the night janitor saves the day because he propped open a single door to the computer room.
 
  • Haha
Likes   Reactions: berkeman and FactChecker

Similar threads

Replies
10
Views
5K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 2 ·
Replies
2
Views
4K