Check out my gitHub page! and my updated CV!

Monday, October 28, 2013

Transformer: mass deobfuscation of php files

During my job I had to deal woth tons of malicious php files per day, let's say about 1000. Most of the files are obfuscated in various ways, from the usage of php-crypt.com up to amateur obfuscation.
Now, if you have to deal with deobfuscation of a single php file you have several opportunities, from web services (like this) to the great php extension created from Stefan Esser evalhook (here is a link of explanation). These strategies are all good if you have a single file, as all of them require user interaction. But what if you have >1000 PHP files per day, and you want to deobfuscate them daily in order to run some similarity tool!?

Let's begin with the basic:
In PHP we basically have two ways of deobfuscating a file and then execute it:
- simple eval function;
- preg_replace with "e" (dear PHP developers, was that really necessary!? a preg_replace with included eval!?);

The main problems when using Evalhook extension over a big number of files are of two categories:
- Evalhook for every "eval" step asks for confirmation of evaluating the code(and that's ok, we can easily automatize a yes answer);
- The file is actually executed while analyzed (and that's not ok at all!)

So what I basically wanted was a way to use Evalhook in an automatic way, in order to let it run over my 1000 files per day, but at the same time I didn't want to execute any part of the code other than the strict necessary for evalhook to do the job.

Typical example of evalhook usage

This is especially important when we are dealing with files created by one author and then used by several others. Most of the web shells I see, in fact, contains a tiny base64 string which is evaluated at run time and which is in charge of sending an e-mail to the creator of the shell (not the one who is actually using it) letting him know that his shell has been uploaded to a certain server. If we run this kind of files with evalhook, and letting him deobfuscating everything, in fact, it will blindly execute all the code up to the point when it reaches the obfuscated part, provoking any sort of damage to the testing machine.

That's why I created Transformer. The principle of Transformer is exactly the one from above: it explores the code, look for any eval/preg_replace with "e" in the code, isolate them, run evalhook over these single pieces and put these pieces back inside the code, allowing for a safer and better deobfuscation.

Details

Transformer run a regex over the file in order to find possible matches (yes, you understood well, it's regex-based). a regex approach has obviously some limitations derived by the nature of regex, but that's the only way to isolate the obfuscated parts of the files without using a PHP parser (which is a very, very hard thing to do) or directly creating a PHP extension (and we don't want any part of the code to run into PHP, in order to be safe against any possible event).
an example of these regex is listed below:



It's important to notice that these regex are made to work over characters. During my experience I saw that an usual way attackers obfuscate their code, other than eval'ing things, is by substituting some of the characters with its hex correspondent. PHP, in fact, accept hex-encoded characters as instructions without any problem.

half hex encoded half ascii!

That's why the whole code, before being splitted, pass through a "decode_hext" function, which is in charge of decoding any hex character into the corresponding ascii.

Once we isolate our obfuscated part of the code, if we are lucky we just give it to evalhook and obtain the decoded version, but what about things like

$a = obfuscated_string;
eval($a)

Transformer is able to match these use-cases by performing a minimal parsing of the code, in order to do some sort of tainting of the variables involved inside the obfuscated parts and attach all operations involving these variables to the obfuscated code. This is particularly useful when dealing with personalised obfuscation, which obfuscate the code via XOR and other operations (personally I also saw something similar to Caesar cypher).

Once we have a script containing both the obfuscated code and any other operations on the variables used during the obfuscation, we let evalhook do his amazing job, obtaining the deobfuscated code.

The output from evalhook is then put back inside the original code, replacing the obfuscated code.

Where It works

Transformer is not intended to be a universal perfect deobfuscator, its strong points are its ability to decode parts of the scripts in an automatic fashion, catching any small obfuscated piece of code and producing the deobfuscated code. It works great when you have massive amounts of obfuscated files and you need the deobfuscated version in order to look at any similarity between the files.

Where It fails

Transformers does not have any knowledge of PHP. Therefore if the obfuscated script goes through several multiline functions before being evaluated, the deobfuscation will fail. There is also a software which obfuscate the script in such a way that during the deobfuscation the script read itself several time, appending determined bytes one next to each other in order to create a new file that will be deobfuscated.
In these cases there are no automatic procedures that can help, as the deobfuscation must be performed by hand checking every single line of evalhook in order to understand when to stop.

Conclusions

What I presented here is a nice way for automatically deobfuscate different PHP files. Obviously we are not as much precise as in a manual analysis (and we don't want to) but if you collect massive amounts of PHP scripts per day, this can be useful.
I obviously created a github repo for the project, I still have to add some example files, the problem is that I only have malicious obfuscated files which I don't want to publish on my github, so I'm creating something ad-hoc!

Notice: even if this tools is meant for having a safer deobfuscation than just using evalhook on the whole script, I still recommend to let the software run in a virtualized machine without connection to the outer world, in order to minimize possibile dangers.

1 comment:

  1. Thank you so much for giving us such kind of handy content which will be most useful to me as well...
    Package substations

    ReplyDelete