OTora Framework Exposes R-DoS Attacks on LLM Agents

The Threat: Reasoning-Level Denial-of-Service

A class of attack known as Reasoning-Level Denial-of-Service (R-DoS) targets LLM-based agents not by corrupting their outputs but by exhausting their computational budgets. The attacker’s goal is to inflate an agent’s reasoning depth and tool-use budget while keeping task correctness intact, meaning standard accuracy-based monitoring will not flag the intrusion [1]. For production deployments where latency is a critical factor, the practical consequence is service degradation that looks, from an output-quality perspective, entirely normal.

The attack surface is meaningful because modern LLM agents are increasingly deployed to execute tool-augmented, multi-step tasks autonomously. Each additional reasoning step and each additional tool call consumes time and compute, and operators typically set budgets around expected workloads. An adversary who can reliably inflate those figures without triggering accuracy alarms can degrade availability while evading conventional safeguards [1].

What OTora Is

OTora is described by its authors as the first unified, two-stage red-teaming framework built specifically to instantiate R-DoS attacks against LLM agents [1]. The framework is designed to give researchers and security practitioners a structured method for probing agent deployments for this vulnerability class before adversaries do so in production.

The project is open source, with code published at https://github.com/llm2409/OTora, making it accessible for operators who want to run adversarial evaluations against their own agent pipelines [1]. The two-stage architecture separates the problem of inducing unwanted tool invocations from the problem of amplifying reasoning depth, treating each as a distinct optimization target.

Stage I: Adversarial Trigger Optimization

The first stage of OTora focuses on generating an adversarial trigger, a crafted input fragment designed to induce targeted tool invocations within the agent. The mechanism relies on two components: insertion-aware scoring and dynamic target co-evolution [1].

Insertion-aware scoring evaluates candidate triggers based on how effectively they cause the agent to call specific tools when the trigger is embedded in an input. Dynamic target co-evolution adjusts the optimization target as the trigger improves, preventing the search from stagnating on easy-to-satisfy but low-impact objectives. The stage supports both black-box settings, where the attacker has no access to model weights or gradients, and white-box settings, where internal model information is available [1]. This dual-mode design reflects realistic threat scenarios, covering both external adversaries interacting only through an API and more privileged attackers.

Stage II: Agent-Aware Reasoning Payload Generation

Once a trigger is in place, the second stage of OTora generates reasoning payloads designed to amplify overthinking within the agent’s chain-of-thought process. The generation method uses an ICL-guided genetic search, combining in-context learning to steer the search toward agent-aware content with a genetic algorithm to iteratively improve payload effectiveness [1].

The critical constraint built into Stage II is that task accuracy must remain near baseline. The payloads are optimized to cause the agent to reason more extensively and invoke more tools than necessary, but not to produce wrong answers. This constraint is what makes R-DoS attacks particularly difficult to detect using output-quality metrics alone, since the agent appears to be functioning correctly throughout [1].

Benchmark Results Across Agent Types

The authors evaluated OTora across three agent types: WebShop, Email, and OS agents. Backbone models tested include LLaMA-70B and GPT-OSS-120B [1].

Across these configurations, OTora achieved up to ten times increases in reasoning token consumption and order-of-magnitude latency slowdowns, while preserving near-baseline task accuracy [1]. The combination of those two outcomes, large latency inflation alongside stable accuracy scores, illustrates the core operator risk: standard evaluation pipelines that track task success rates would not surface these attacks as failures.

The results span multiple agent domains and multiple backbone scales, suggesting the vulnerability is not specific to a single model family or task type.

Mitigation Strategies and Implications for Agent Operators

The paper includes discussion of mitigation strategies oriented toward detecting and constraining abnormal reasoning and latency spikes in deployed agents [1]. For operators running agents in production, the findings point toward several practical considerations.

Accuracy-only monitoring is insufficient as a detection mechanism, given that R-DoS attacks are specifically designed to preserve correct outputs. Operators would need to instrument agents for reasoning token counts and per-request latency, then establish baselines and alert thresholds for deviations. The paper’s framing around “constraining” abnormal reasoning suggests that hard limits on reasoning depth or tool-call budgets per request could serve as a structural defense, though the specific mechanisms are described at a high level in the abstract [1].

The open-source release of OTora gives security and platform teams a concrete tool for adversarial testing before deployment, allowing them to measure their own agents’ susceptibility to R-DoS conditions and calibrate monitoring thresholds accordingly.

FAQ

Q. Does OTora require white-box access to the target model to be effective? No. Stage I of OTora is designed to support both black-box and white-box settings, meaning attacks can be instantiated without access to model weights or gradients [1]. Black-box mode reflects the conditions an external adversary would face when interacting through an API.

Q. Will task accuracy metrics catch an R-DoS attack in production? Based on the paper’s findings, accuracy metrics alone will not reliably detect these attacks. OTora’s Stage II explicitly optimizes for near-baseline task accuracy while inflating reasoning token usage, so output-quality monitoring would not flag the intrusion [1].

Q. Which agent types and backbone models were tested? The evaluation covered WebShop, Email, and OS agents, using LLaMA-70B and GPT-OSS-120B as backbone models [1]. The results were consistent across these configurations in demonstrating significant token inflation and latency slowdowns.

Q. Is there a public implementation available for operators to test their own agents? Yes. The authors have released the OTora codebase at https://github.com/llm2409/OTora, enabling operators to run adversarial evaluations against their own agent pipelines [1].

Q. What monitoring signals should operators add to detect R-DoS conditions? The paper points toward tracking reasoning token counts and per-request latency as necessary complements to accuracy monitoring, and discusses constraining abnormal reasoning and latency spikes as a mitigation direction [1]. Specific threshold values are not detailed in the available abstract.

Key takeaways

R-DoS attacks degrade LLM agent availability by inflating reasoning depth and tool-call budgets while preserving task accuracy, making them invisible to output-quality monitoring [1].
OTora is a two-stage red-teaming framework: Stage I optimizes adversarial triggers for targeted tool invocations; Stage II generates reasoning payloads that amplify overthinking without reducing correctness [1].
Evaluated across WebShop, Email, and OS agents on LLaMA-70B and GPT-OSS-120B, OTora produced up to ten times reasoning token inflation and order-of-magnitude latency slowdowns [1].
The framework supports both black-box and white-box attack settings, broadening the realistic threat scenarios it covers [1].
The open-source release at https://github.com/llm2409/OTora allows operators to conduct adversarial evaluations of their own deployments before production exposure [1].