- 22,613
- 7,602
Large Language Models (LLMs) are passive, text-based engines that respond to prompts, whereas Agentic AI systems are autonomous and use LLMs as their "brains" to plan, make decisions, and execute multi-step tasks using external tools (derived from Google's AI Overview). Depending on the 'rules', there may be problems, e.g., hallucinations, or worse, 'misbehavior'. Coding and training appear to be critical factors.
The author of the article quotes a statement from the researchers of the cited study, "Given rapidly advancing capabilities, we expect the plausible robustness of rogue deployments to increase substantially in the coming months". I don't quite understand the phrae "plausible robustness of rogue deployments". Does that mean there will be improvements in mitigating or preventing 'rogue' behavior, or do they expect more 'rogue' behavior?
Edit/update: METR Frontier Risk Report (February to March 2026)
https://metr.org/blog/2026-05-19-frontier-risk-report/#executive-summary-and-guide-to-the-report
https://futurism.com/artificial-intelligence/ai-rogue-disturbing-advancedWe’ve already seen AI go rogue on numerous occasions. Now, new research suggests that we can expect this to become the norm.
The AI research nonprofit Model Evaluation and Threat Research (METR) recently released a study conducted between February and March of this year, aimed at determining just how likely frontier AI models could go rogue. If you’re given to anxiety about the future of AI, the results are unlikely to make you feel better.
The author of the article quotes a statement from the researchers of the cited study, "Given rapidly advancing capabilities, we expect the plausible robustness of rogue deployments to increase substantially in the coming months". I don't quite understand the phrae "plausible robustness of rogue deployments". Does that mean there will be improvements in mitigating or preventing 'rogue' behavior, or do they expect more 'rogue' behavior?
Users should be aware and cautious in using Agentic AI/LLMs depending on the context of inputted information/data.The research examined LLMs developed by OpenAI, Google, Anthropic, and Meta for the purpose of the study. They found that frontier AI systems are showing signs of disturbingly deceptive behavior as they become more advanced, often turned to verboten shortcuts or otherwise subverting their operators’ instructions — and some were even smart enough to try to cover their tracks.
In one instance, an internal frontier AI model from OpenAI was told to use specific software for an assigned task. Not only did the agent ignore the request, but it also injected a code to erase evidence of how it arrived at its conclusion — which did not involve use of that software.
In another test, an AI agent from Anthropic was caught “reward hacking.” This is when AI identifies loopholes that help it complete its assignment in a literal sense, even if it doesn’t produce the desired outcome. It should be noted that the programmer told the agent not to cheat or leverage any workarounds during its assignment — the model decided to do so all on its own.
The METR researchers behind the study do not believe there is reason for alarm just yet. For example, they don’t think any of these models is capable of hiding evidence of going rogue on a larger scale. However, they did issue a warning: without stronger security and monitoring, there is a stark risk of this becoming a reality.
Edit/update: METR Frontier Risk Report (February to March 2026)
https://metr.org/blog/2026-05-19-frontier-risk-report/#executive-summary-and-guide-to-the-report
Last edited: