Written by Cole Herzog
Obfuscations and You
Application security is a known known now. Everyone knows they need it, and attackers know how to defeat the basic protections; shipping an unprotected application just isn’t an option. All strong protections have two parts: static analysis prevention and dynamic attack detection. Static analysis is a powerful tool that gives attackers unique insight into application-specific implementation details. It exposes the most vulnerable parts of an application, helping to speed up an attack or steal valuable business IP. Source code, as well as data, obfuscation is an important part of any static analysis protection. Here are a few ways code and data can be transformed to slow down malicious actors. The harder they have to work, the safer our world can be.
Privatize Your Protection
While applications shouldn’t be shipped without some form of obfuscation, not all obfuscations are created equal. Two crucial components for any strong obfuscation are private IP and randomization. The currently available open-source obfuscators can be just as helpful to attackers as they are to application developers. The level of protection those open-source obfuscators supply isn’t powerful enough to reliably secure enterprise-quality applications. Impactful obfuscations need to transform source code into something unreadable by humans, and irreversible by automated attack tools. Obfuscation techniques that are publicly available function as their own Rosetta Stone for defeating them. Using obfuscation techniques that are custom built, and not publicly accessible, stops Timmy Turner from reversing an app’s obfuscation using his Generative AI Fairies. Pair those private algorithms with inherent randomness to ensure no human being, or LLM, can untangle the yarn.
PRNG Strikes Back
Attackers can glean quite a bit of information by comparing different releases of the same application. Having obfuscations change between app versions is an absolute necessity. Most pseudo-randomization engines use a seed to force predictability while keeping inherent randomness, and obfuscations are no different. Simply incrementing or decrementing a seed value will dramatically impact the obfuscations applied to an input application. This gives reproducibility when needed but forces attackers to start over again when reversing an application after a new version is released. It’s like the attacker shovels half of the driveway, takes a break for some hot cocoa, and comes back to fresh snow. Strong security focuses on wasting a reverse engineer’s time, and a constantly changing landscape is one of the best ways to waste their time.
Obfuscation Obfuscating Obfuscations
Conceptually, obfuscation is the opposite of optimization. Just like compilers often have multiple optimization passes, we have multiple obfuscation passes. That means we apply obfuscation to the obfuscations that were just added. The result is staggeringly difficult to read.
Things like a simple `add` or `mov` instruction can quickly become fifty, or hundreds, of instructions. Obfuscating `adrp add` pairings properly can decimate an automated tool’s (like ghidra or IDA) ability to resolve program control flow. Seemingly minor obfuscations can make it nearly impossible for an attacker to find the start, end, or middle of that `acceptUserCredentials` function call. Hundreds of compounding and varied obfuscations leave potential attackers trying to figure out how the code even runs successfully.
Bop it, Twist it, Configure it
Not every method or function call is sensitive in an application, and some method or function calls may be heavily performance-sensitive. While uniformly applying obfuscation across an entire application is still significantly better than nothing, that strategy can have a sizeable impact on runtime performance. Giving security engineers the ability to heavily obfuscate sensitive calls, while lightly obfuscating performance-intensive calls, is part of providing a world-class protection tool. Any dynamic protections (Guards) injected into the application require just as much obfuscation as underlying source code requires. Applying the same high-security obfuscation techniques to those active guards helps keep them protected and running and slows down attackers’ efforts to identify and remove them. Configurability is crucial to any security-centric organization, and protection tooling needs to change as quickly as a company’s direction can. As previously mentioned, some obfuscations are built differently. Here are a few concepts that any serious obfuscation engine provides, and how it can keep out unapproved air traffic.
Computed Control Flow
Attackers can gain quite a bit of knowledge of an application just by looking at the flow of function calls throughout the application. To combat that, our obfuscations use something called Computed Control Flow. Computed Control Flow stops decompilers from connecting function calls and label references to the main entry point of the applications. It’s a critical part of any world-class protection. When a decompiler analyzes a binary, seeing how one function calls another function is often trivial. Take, for example, instructions like: `b.eq 0x10055bc42` and `bl 0x100abcdef`. A decompiler knows exactly what lives at those virtual memory locations. In the case of the previous two instructions, it’s either labels within a function, or a function call itself. Any decompiler worth its salt will render a beautiful spider web of intertwined function calls, showing a clear path of execution throughout an application. Combine this with other basic tricks, like seeing where in the assembly a “login successful” message is used, and it becomes trivial for an attacker to identify exploitable code. Computed Control Flow is a set of mathematical operations injected before obvious control flow calls. These mathematical operations stop a decompiler from knowing the exact Relative Virtual Address being jumped to, destroying its ability to generate a control flow graph.
Control Flow Flattening
Computed Control Flow is already a terror for decompilers; it sure would be a shame if we made it worse. Another strategy to derail logical control flow is called Control Flow Flattening. We have various forms of CFF, but a recently added one (named The Nexus) breathes fresh life into this critical obfuscation technique. We’ve all written a chess game in `main` before, but how about writing an entire production-level application in a single, self-referencing switch statement? The nexus does just that while still prioritizing runtime performance. There’s a plethora of information out there about CFF, its importance, and how it works. It’s impossible to understate how difficult this obfuscation technique makes attackers’ lives, and no app should leave staging without it.
C h o p u p
No stainless steel knife set would be complete without a matching pair of stainless steel scissors, and Chopup complements the control flow obfuscations in much the same way. Binary file formats are usually densely packed to save space on disk. That means most related function calls are pretty close to each other. As a result, a disassembled binary tends to group related functions together and almost always has functions as one contiguous block in the binary. That’s way too easy to read, and really grinds our gears. Chopup changes that. There’s no technical restriction in a binary requiring function calls to be contiguous in memory, and a good pair of scissors (with some flexible glue) can leave a binary working as expected but force attackers to jump to every corner of the binary in search of logical control flow.
mgDaea and Repair
While obfuscating the code flow itself is a crucial layer of protection, obfuscating the underlying application data is just as necessary. Damage and Repair are the Chaotic Evil and Lawful Good of static analysis protections, and almost always go hand in hand. Damaging a piece of data, or even in some cases, a piece of code, and then repairing it right before it’s read from or run makes static analysis effectively impossible. Damage changes the specified data into what looks to be garbage memory. When that same data is exposed to a Repair call, it’s reverted to its original form, many times just before it’s used. That data can then be Damaged again after falling out of scope and won’t be used until the next time the function is called. That API key might be “1#&at0d$*nd@@z” in the binary to start, but rest assured, it’ll get fixed up before it gets used.
Even with source code obfuscation and data obfuscation, there are still source code symbol names left-over in a compiled binary. For example, a symbol artifact for a function name like “acceptContinuousPayment” can easily provide a direct link to that function’s implementation. Once found, it’s trivial to alter the assembly and access premium features without paying. Comprehensive symbol renaming hides the remaining artifacts and leaves nothing for attackers to work with. Removing any hints or clues to sensitive functionality in a protected binary can force attackers to give up on static analysis completely.
These are just examples of what strong obfuscation practices and methodologies look like, and it is not an exhaustive list by any means. Leaving an application unprotected in the wild gives anyone complete visibility into unique IP and provides any attacker a fog-free window into business logic. With the meteoric rise of generative AI and source code co-pilots into the mainstream, it’s become infinitely easier for people to write code, and for some people to reverse code. An organizational-wide commitment to implementing strong security solutions is the only viable path forward in such a fast-paced environment. Don’t let your developer’s hard work leave home without it.
Read IDC’s take on the importance of obfuscation and anti-tampering as part of your DevSecOps practice.