How to Detect Malicious Polyglot Files with MetaDefender Sandbox

Oct 30, 2023 by Filescan Sandbox Labs

Share this Post

While many enjoyed their summer vacation, cybercriminals were hard at work, and at MetaDefender Sandbox, so were we. A cluster of recent vulnerabilities and attacks kept us busy adding new detections to our ever-increasing list.

Let’s check out a few examples.

On July 8th, the BlackBerry Threat Research and Intelligence team reported that a threat actor may have targeted NATO membership talks using a similar vulnerability to CVE-2017-0199. On July 11th, Microsoft disclosed a zero-day vulnerability tracked as CVE-2023-36884. Then, on August 28th, the Japan Computer Emergency Response Team Coordination Center (JPCERT/CC) confirmed that attackers used a new technique back in July.

Masquerading File Types

What was so interesting about these specific incidents? —All of them used hidden files in other document types to deliver malicous payloads—a nested doll of flies intended to avoid detection.

They are examples of the MITRE technique T1036.008 (Masquerade File Type) using polyglot files that have multiple different file types. Each file type functions differently based on the application that executes them, and they are an effective way to disguise malware and malicous capabilities. They also exemplify the MITRE technique T1027.009, where attackers embed malicious payloads within the masqueraded files.

Fortunately, these techniques present a perfect opportunity to demonstrate how MetaDefender Sandbox easily recognizes malicious masqueraded files!

July's Attack

In this campaign, the attackers used RTF exploitation. They crafted a Microsoft Word document, which held an embedded RTF document, loaded by Microsoft Office as a document relationship. The RTF document points to an Office internal element as shown in Figure 1.

Figure 1 Office document internals (relationships and embedded RTF)

In Filescan Sandbox, we parse all document relationships, including external relationships implemented by Office documents (Figure 3). But in this artifact, there is something else to detect about this chain.

Figure 2 Example of external relationship flagged by the Filescan Sandbox engine.

The RTF document from the campaign includes an OLE object defined inside in a hexadecimal string, which is readable in plaintext because of the nature of RTF files. Figure 3 highlights this element in the extracted RTF file. Also visible are the URLs the OLE object refers to, which Microsoft Office finally accesses to fetch and execute the next stage of the payload.

Figure 3 RTF content and its defined OLE object

From the corresponding Filescan Sandbox analysis of this malicious file, we find the extracted RTF file, along with the corresponding indicators of compromise and the different detections flagged by the engine, shown in Figures 4 and 5.

Figure 4 Files extracted from the Filescan Sandbox analysis of the malicious document

Figure 5 Related threat indicators triggered by the Filescan Sandbox engine.

August's Attack

In this attack, the malicious file is identified as a PDF by other scanners since it includes the PDF file signature and object stream structure, as observed in Figures 6 and 7. However, the file is only functional if opened as an Office or .mht file. If opened as a PDF, it throws errors.

Figure 6 Text editor showing the PDF stream containing MIME objects.

This attack is possible because one of the PDF streams defined a set of MIME objects, including an MHTML or MHT object, which will load an ActiveMime object embedded in MIME format. Such files permit the macro code to execute. Also, the MIME object header has a fake ".jpg" content-type value with slightly obfuscated content, likely to avoid detection by tools like Yara.

As observed in Figure 7, despite technically being a PDF, Microsoft Office loads this object and executes the additional malicious code.

Figure 7 PDF sample opened with Microsoft Office as an office document.

Interestingly, while many PDF tools will fail to parse this file, Didier Steven’s tool olevba identifies macro code information from the PDF file, even though it is generally used for parsing Office documents. Additionally, Filescan Sandbox analysis detects and flags both the presence of an ActiveMime object embedded, the VBA information, and the extraction of all the MIME objects.

Figure 8 Threat indicators triggered by the Filescan Sandbox engine.

Figure 9 Files extracted from the document by the Filescan Sandbox engine.

There you have it. That’s what the Filescan Labs team did on our summer vacation. We unmasked masqueraded files to identify the malicious payloads hidden underneath and added more detections to our ever-growing list.

Get in touch, if you want to analyze files to see if adversaries masquerade malicious payloads as legitimate files or extract embedded payloads within other files, check out the MetaDefender Sandbox community site or try our enterprise scanning service.

Talk to an Expert

Tags: