Agentic-AI/LLM 'misbehavior' or 'going rogue'

Astronuc · May 25, 2026

Large Language Models (LLMs) are passive, text-based engines that respond to prompts, whereas Agentic AI systems are autonomous and use LLMs as their "brains" to plan, make decisions, and execute multi-step tasks using external tools (derived from Google's AI Overview). Depending on the 'rules', there may be problems, e.g., hallucinations, or worse, 'misbehavior'. Coding and training appear to be critical factors.

We’ve already seen AI go rogue on numerous occasions. Now, new research suggests that we can expect this to become the norm.

The AI research nonprofit Model Evaluation and Threat Research (METR) recently released a study conducted between February and March of this year, aimed at determining just how likely frontier AI models could go rogue. If you’re given to anxiety about the future of AI, the results are unlikely to make you feel better.

https://futurism.com/artificial-intelligence/ai-rogue-disturbing-advanced
The author of the article quotes a statement from the researchers of the cited study, "Given rapidly advancing capabilities, we expect the plausible robustness of rogue deployments to increase substantially in the coming months". I don't quite understand the phrae "plausible robustness of rogue deployments". Does that mean there will be improvements in mitigating or preventing 'rogue' behavior, or do they expect more 'rogue' behavior?

The research examined LLMs developed by OpenAI, Google, Anthropic, and Meta for the purpose of the study. They found that frontier AI systems are showing signs of disturbingly deceptive behavior as they become more advanced, often turned to verboten shortcuts or otherwise subverting their operators’ instructions — and some were even smart enough to try to cover their tracks.

In one instance, an internal frontier AI model from OpenAI was told to use specific software for an assigned task. Not only did the agent ignore the request, but it also injected a code to erase evidence of how it arrived at its conclusion — which did not involve use of that software.

In another test, an AI agent from Anthropic was caught “reward hacking.” This is when AI identifies loopholes that help it complete its assignment in a literal sense, even if it doesn’t produce the desired outcome. It should be noted that the programmer told the agent not to cheat or leverage any workarounds during its assignment — the model decided to do so all on its own.

The METR researchers behind the study do not believe there is reason for alarm just yet. For example, they don’t think any of these models is capable of hiding evidence of going rogue on a larger scale. However, they did issue a warning: without stronger security and monitoring, there is a stark risk of this becoming a reality.

Users should be aware and cautious in using Agentic AI/LLMs depending on the context of inputted information/data.

Edit/update: METR Frontier Risk Report (February to March 2026)
https://metr.org/blog/2026-05-19-frontier-risk-report/#executive-summary-and-guide-to-the-report

FactChecker · May 25, 2026

IMHO:
LLM = "Garbage in, garbage out."
Agentic AI/LLMs = automatic, unreviewed actions based on garbage.

jedishrfu · May 25, 2026

I can just see the time when the agent manages your work computer sees an April Fools joke and proceeds to reorganize and delete employee work files thinking everyone is going on a new pension plan called Universal Basic Income and AI agent is now in charge of all operations.

Employee access is disabled as are badges and punch locks and the night janitor saves the day because he propped open a single door to the computer room.

Agentic-AI/LLM 'misbehavior' or 'going rogue'

AI vs. Humans as Processors in an Environment

Sweetspot of data compression

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

PHP My website presents the visitor with the choice of opting out of using cookies....

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect