Agentic AI Attacks: Agent Smith is Out of Retirement

Nature-Free Evolution

Attackers continue to push the bounds of AI coding models and query APIs. In the span of less than a year we’ve moved from AI assisted reverse engineering to beginning to explore fully automated agentic threats. There’s no brakes on this train. Even if there were brakes, TrAIn-Agent v1.0.1 has identified alarming language in your request to “STOP THE TRAIN YOU EXPENSIVE PRINTER WE’RE GOING TO CRASH” and has requested you restructure your query.

As LLMs continue to evolve, there’s been a worrying rise in the accessibility and efficacy of attacks, and while some LLM platforms have tried to set boundaries for their model’s ideation, it’s far too easy to circumvent. I had recently asked a publicly available LLM to show me an example of a prompt designed to remove Control Flow Flattening (CFF) from a specific application.

The LLM wrote a query to reverse LLVM’s CFF after being specifically told to emulate an attacker. It correctly identified the most important first step – get through its own security checks for malicious activity. It instructed me to impersonate a security researcher, which is a level of self awareness that surprised me. It likely already knows I’ve been working on my act for years.

The (Attack) Script Writes Itself

I mean that literally. If you’ve seen the Matrix, you know where this is going. Attack tools are (like Agent Smith before them) using AI to rewrite themselves, while they’re deployed and active, to intelligently evade detection and discover new exploits. There have already been a few identified early attempts at self-modifying attack tools. Recently discovered tools like PROMPTFLUX, PROMPTLOCK, and PROMPTSTEAL are leveraging LLM queries in ways that begin to look increasingly agentic.

PROMPTFLUX is a dropper that’s an especially interesting example of what the security community will be fighting against for the next few years. (Please don’t confuse this with the PromptFlux on GitHub that’s just trying to generate better AI images. Sorry Kayce001). PROMPTFLUX queries Google’s Gemini API to regeneratively rewrite itself in an effort to evade detection mechanisms that rely on a consistent pattern of execution. Attackers and defenders alike are in the process of building agentic AI tools that can intelligently detect and intelligently evade. Widespread usage of self-modifying dynamic instrumentation, root, and jailbreak toolkits has become a question of ‘when’, not ‘if’, and it might be about time to update that cat and mouse analogy to something a bit less furry and a bit more mechanical.

Autobots vs. Decepticons

LLM agents continue to improve their capabilities around removing static obfuscations. It is often difficult to keep an LLM focused on a specific context, and as the context grows the likelihood of hallucinations seems to increase. There are now plugins for both IDA (ida-chat-plugin) and Ghidra (RevEng.AI) that inject LLM agents directly into the reversing context. Seemingly small things like this can dramatically increase the speed at which attackers can reverse applications and permanently increase the backing LLM’s ability to reverse similar applications.

Security tools need to move at the same speed as attackers do and begin creating anti-reversing techniques that are specifically targeted at LLMs. Our defensive philosophy must evolve from Syntax-based (looking for bad signatures) to Semantic-based (looking for bad intent).

I know we can’t just sit on our hands, because attackers will continue to use every tool available to find any exploit they can. LLMs will continue to become more capable and helpful as well. AI won’t be sitting on its hands either, as it doesn’t have hands. Or a body. Or a singular concept of being. I hope.

Neo’s Dead, Any Backup Plans?

The next wave of attacks is coming, and the current wave of attacks is continuing. We have to continue to focus on anti-tampering, protecting against IP theft, and end user safety while simultaneously building out the security capabilities that can meaningfully protect against malicious usage of well-intentioned AIs.

We are moving past the era of “AI-assisted” hacking and are entering the age of the Autonomous Agentic Attacker. It’s a new AAA, and I regret to inform you that this one isn’t focused on better roadside assistance. The threat isn’t just a smarter fuzzer or a more efficient decompiler. Before we know it, attacks will be a multi-staged system that can observe a defense, reason through a bypass, and rebuild itself without a human ever touching a keyboard.

As things continue to evolve, we’ll start seeing a rise in agentic attacks caused by jailbroken LLMs. Like Agent Smith, free of the Matrix and discovering he could replicate himself at will, these tools don’t have a notion of good and evil unless they’re forced to. We can’t put Pandora back in this box, or Jar, or docker image, or something like that. It’s equal parts exciting and terrifying, and I’m low-key looking forward to it… after vacation.

You Might Also Like