Deobfuscating Maldocs - Payloads in Objects

When it comes to maldocs, obfuscation techniques trend depending on what the doc-generators are using. For example, at one time giant base64 strings were concatenated at runtime, which would typically decode to an obfuscated powershell script. Then DOSfuscation was released, and that trended for a good while (and is still quite prevalent). What has been very popular for a while now, are maldocs that do contain macros, but pull/store the true payload in some object outside of the macrocode, from within the actual document. Thought I'd walkthrough my approach of deobfuscating maldocs that leverage this obfuscation technique.

First thing's first, extracting macros using Didier Steven's awesome tool 'oledump', which can be found here: https://blog.didierstevens.com/programs/oledump-py/
The malodoc we are analyzing can be found here: https://app.any.run/tasks/705356e8-e50a-4e37-9a23-c1979eadf6f7
Command: oledump.py WHxA-O11UUt9rRSBFOo_hyzpDsMYE-wD

Examining the output, you will notice that most of the streams containing macros (tagged with an 'M' or 'm') are quite small. Likely too small to contain a real payload. For example stream 12 only has Attribute info. 
Command: oledump.py WHxA-O11UUt9rRSBFOo_hyzpDsMYE-wD -s 12 -v

The largest macro is tagged with the capital 'M', which we can extract for closer examination. I recommend copying the output into notepad++ and changing the language to Visual Basic for a prettier look.
Command: oledump.py WHxA-O11UUt9rRSBFOo_hyzpDsMYE-wD -s 19 -v
The code above will appear daunting to the untrained eye. But we can expect lots of garbage code and noise within these maldocs. If you notice, there is a clear pattern in the output above. A couple conditional statement (if..Then), and a switch statement. This pattern is repeated throughout the macrocode, and doesn't look like valid code.  This is typical of obfuscation, an algorithm was used to add garbage to obscure the true payload, and because this is typically automated, there will be patterns. Another thing we can do is check to see if any of these randomly named variables are used again. If a variable is declared or defined and never used - usually means it's garbage!
If we keep looking though the code, we will finally see something other than the pattern mentioned above - a reference to an object.
After reversing dozens of maldocs that use this technique, I can confidently say that most of the time, payloads stuffed in objects (typically either a Text Box object or Shape object) are usually a fully intact string (usually base64). Since it is fully intact, we can literally run strings on the maldoc itself to pull it out.
Command: strings -n 100 WHxA-O11UUt9rRSBFOo_hyzpDsMYE-wD | sort | uniq | less
As you can see, there were two strings returned from our command.   The second one obviously being the payload. Another approach (if we weren't aware of the strings trick) is to pick through the document, looking for references to the objects/properties referenced in the GetObject() function. Scrolling through a hexdump of the document, we can see the ControlTipText property mentioned, as well as several others. 
And just below that, we see our payload.
Alternatively, if you have an office license, you can open the document in developer mode, then debug the macro. I recommend setting a breakpoint on the GetObject() function, then just checking the value of each variable/property by hovering over.  Here we see the string 'powershell' being concatenated.
 And within the .zAAQA1DA we see the base64 payload.

Now we can decode the base64 and continue deobfuscating in notepad++.
Command: echo "<base64 string>" | base64 -d
The decoded base64 is an obfuscated powershell script. We can tell that it's powershell due to:
  • Variables being prepended with '$'
  • Usage of the format operator {}-f''
  • Cmdlets 'Get-Item' or 'Invoke-Item'
  • Usage of the 'Net.Webclient' class
What I like doing first is formatting the script a bit. We can use the semi-colon and some regex to split up the lines.
The output is a bit easier on the eyes.
We can now use one of my favorite reversing techniques, self-decoding. For example, the variable $UQAcXA is storing a heavily obfuscated value, mainly through abusing the format operator (-f). We can open a new powershell windows, define the variable just as the script does (copy + paste), then read the output.
Defining the variable:
Printing to the screen:
We can now repeat that process for any variables we want. Also notice that just like the macrocode, there are several variables defined in the script that are never used - just garbage code. After allowing the obfuscated values to self-decode, renaming variables, and beautifying the script, we are left with a simple downloader. The script will loop through the list of URLs, attempting to download and invoke an executable. 
Thanks for reading and Happy REversing!



Comments