Archive file formats like ZIP and RAR have emerged as one of the most prevalent tools for malware delivery, offering cybercriminals a reliable and efficient way to distribute malicious payloads. According to threat research from HP Wolf Security, in Q2 2024 alone, archive files accounted for 39% of all malware delivery methods, making them the top choice for attackers. By embedding harmful scripts, executables, or phishing content within seemingly innocuous archives, hackers exploit the trust and familiarity associated with these formats.
Sophisticated techniques such as leveraging the ZIP format’s modular architecture, manipulating file headers, and concatenating archive files allow attackers to bypass detection systems and execute their payloads undetected. The widespread adoption of these formats, coupled with their ability to evade traditional security measures, underscores their role as a significant and evolving threat. In this blog, we’ll discuss the technical mechanisms behind these attack techniques, analyze their effectiveness, and provide practical steps to detect and prevent these archive-based threats.
Scenario 1 – Archive File Concatenation
The ingenuity of evasive archive file concatenation lies in its ability to cleverly combine multiple archive files in a way that bypasses traditional detection methods or restrictions, while maintaining the functionality and accessibility of the data. This technique is often employed in creative or unconventional scenarios, such as circumventing file size limits, obfuscating content, or optimizing data storage in unique ways.
The efficacy of this evasion tactic stems from the varying behaviors of different ZIP parsing tools when handling concatenated files:
- 7-Zip: Parses only the first ZIP archive, which may be benign, potentially overlooking the malicious payload embedded in subsequent archives.
- WinRAR: Processes and displays only the last part of the ZIP structures.
- Windows File Explorer: May fail to open the concatenated file entirely. However, if the file is renamed with a .RAR extension, it might only render the second ZIP archive, omitting the first.
This inconsistency in processing concatenated ZIP files enables attackers to evade detection mechanisms by embedding malicious payloads within segments of the archive that certain ZIP parsers are unable or not designed to access.
Scenario 2 - Misleading Archive Files
MS Office as An Archive
When most people think of archives, they immediately think of ZIP, RAR, or perhaps TAR files. While these are indeed common archive formats, they represent only a fraction of the possibilities. Many modern file formats leverage archive structures in their underlying implementations, often in ways that are not immediately apparent.
For example, Microsoft Office 2007 files (.docx, .xlsx, .pptx) utilize archive structures. Attackers exploit this by crafting .pptx files that appear to be standard PowerPoint presentations but are internally ZIP archives (starting with the signature 50 4B 03 04). The attacker embeds a malicious payload disguised as a legitimate XML file or embedded image within the archive.
Security tools often miss this threat because they treat the file as a .pptx based on its extension and structure, rather than analyzing the underlying ZIP archive. While tools like WinRAR or 7-Zip can reveal the file’s contents, automated scanners focus on the file type, overlooking the hidden payload.
Attackers may further evade detection by altering the ZIP archive’s internal structure, such as using non-standard file names or hiding the payload in obscure locations. This tactic leverages ZIP's flexibility, allowing it to define custom file types like .pptx while retaining the same signature.
DWF as a ZIP File
The ZIP file structure is a widely used format that some file types leverage to create unique formats, incorporating part of the ZIP file header into their design. For example, a standard DWF file has a header (28 44 57 46 20 56 30 36 2e 30 30 29) followed by the ZIP file signature [50 4B 03 04) combining both into a single file structure. Security tools like 7-Zip typically extract ZIP files by recognizing the file header as [50 4B 03 04). However, in the case of a DWF file, these tools treat it as a non-archive file and do not attempt to extract its contents.
Attackers exploit this behavior by crafting malicious files that begin with a non-ZIP header (like the DWF header) followed by the ZIP signature and a hidden payload. When such a file is opened in a DWF viewer, the software processes it as a legitimate document, ignoring the appended ZIP data and leaving the payload inactive. However, if the same file is processed by an archive extraction tool, the tool recognizes the ZIP signature, extracts the payload, and executes it. Many security scanners and extraction tools fail to detect such disguised archives because they rely on the file’s initial header to determine its format.
This technique is highly effective because it exploits an inherent limitation in how many tools process file headers. Security tools that do not perform deep inspections or scan beyond the initial header may allow the malicious archive to slip through defenses undetected.
Recursively Extract and Scan Archives with OPSWAT MetaDefender Platform
Modern enterprise environments depend on a variety of tools—like antivirus software, firewalls, and EDR (endpoint detection and response) systems—to detect and block malware from compromising their critical infrastructure. However, these protective measures often have weaknesses that threat actors regularly exploit. To effectively identify and counter these evasive tactics, let’s take a closer look at how the OPSWAT MetaDefender Platform can flexibly configure its core engines including MetaScan™ Multiscanning, Archive Extraction, and Deep CDR™ to secure your systems.
Detecting Threats in Scenario 1 with Standard Archive Extraction
To demonstrate the archive concatenation attack technique, we first submit the ZIP file as a regular archive and let the AV (antivirus) engines handle the ZIP extraction and malware scan. Only 16 out of 34 AV engines are able to detect the malware.
Next, we add one layer of complexity to the regular archive file.
Only 11 out of 34 AV engines detect the malware. MetaScan handles this case effectively.
Finally, we add another layer to the regular ZIP file.
This time, only 7 out of 34 AV engines detect the malware, and the four AV engines that had previously detected it have now failed to do so.
We also tested it using an external archive extraction tool. However, the tool only checks the first and last parts of the archive file, missing the middle section where the malware file is stored.
Detecting Scenario 1 Threats with OPSWAT Archive Extraction Engine
The Archive Extraction Engine fully extracts the ZIP file, enabling the AV engines to scan the nested files.
The scan result for these nested files is shown below:
Additionally, when Deep CDR is enabled, it generates a new file without the malicious content, as demonstrated below:
Detecting Scenario 2 Threats Without Archive Extraction Engine
We first upload a file that leverages techniques used in scenario 2 to MetaDefender Core. The file is scanned with MetaScan for potential malware. The results are as follows:
Only 2 out of 12 popular AV engines in MetaScan can detect the malware. This means that a significant number of organizations may be vulnerable to attacks that exploit these archive format intricacies.
Detecting Scenario 2 Threats Using OPSWAT Archive Extraction Engine
5 out of 12 popular AV engines detect the threat within the .dwf file.
The Archive Extraction engine further extracts the file and analyzes its nest files.
All extracted nested files are again scanned with MetaScan, where 9 out of 12 AV engines detect the threat.
About OPSWAT Core Technologies
MetaScan™ Multiscanning
MetaScan Multiscanning is an advanced threat detection and prevention technology that increases detection rates, decreases outbreak detection times, and provides resiliency for single vendor anti-malware solutions. A single antivirus engine can detect 40%-80% of malware. OPSWAT Multiscanning allows you to scan files with over 30 anti-malware engines on-premises and in the cloud to achieve detection rates greater than 99%.
Archive Extraction
Detecting threats in compressed files, such as .ZIP or .RAR, can be difficult due to their large file size and ability to mask hidden threats in archives. MetaDefender offers fast processing of archives by allowing administrators to perform archive handling once for each file type, instead of requiring each individual anti-malware engine to use its own archive handling methods. Additionally, administrators can customize the way archive scanning is performed to avoid threats like zip bombs.
Deep CDR™
Traditional antivirus solutions miss unknown threats. Deep CDR eliminates them entirely. Each file is disarmed and regenerated, ensuring only safe, clean and usable content reaches your systems. By focusing on prevention rather than just detection, Deep CDR enhances anti-malware defenses, protecting organizations from file-based attacks, including targeted threats. It neutralizes potentially harmful objects in files traversing network traffic, email, uploads, downloads, and portable media before they reach your network.
Detect Evasive Threats with OPSWAT Archive Extraction
Threat actors are increasingly agile in their methods to infiltrate systems and exfiltrate sensitive information. By leveraging the right tools to detect and mitigate breaches, organizations can prevent these adversaries from gaining access to critical data and moving laterally within the network. By utilizing the powerful Archive Extraction engine in MetaDefender Core, you can bolster your defenses against embedded malware that may evade individual AV engine extraction tools.