How MetaDefender™ Prevents Sophisticated Polyglot Image Attacks

Jan 16, 2025 by Loc Nguyen, Penetration Test Team Lead

Share this Post

Web applications that facilitate file uploads have become essential for numerous organizations, acting as portals for clients, partners, and employees to share various types of documents and files. For instance, an HR firm might enable users to upload resumes, or a company could make it easier for partners to share files via a specialized web platform.

Even with enhanced security measures and stricter validation processes, attackers continue to exploit vulnerabilities using sophisticated methods. Files that appear benign, such as images, can be manipulated to compromise the security of a web server.

Polyglot files are files that can be valid as multiple types simultaneously, allowing attackers to bypass file type-based security measures. Examples include GIFAR, which functions as both a GIF and a RAR file, JavaScript/JPEG polyglots that are interpreted as both JavaScript and JPEG, and Phar-JPEG files, recognized as both a Phar archive and a JPEG image. These polyglot files can go undetected with deceptive or empty extensions that “trick” systems into thinking they are a benign file type (like an image or PDF) while containing undetected malicious code.

File Upload Validation

Allowing file uploads from users without proper or comprehensive validation poses a significant threat to web applications. If an attacker successfully uploads a malicious file, such as a web shell, they can potentially gain control over the server, compromising both the system and sensitive data. To mitigate these risks, best practices have been established to guide developers in applying effective validation measures. These practices help ensure the secure processing of file uploads, thereby minimizing the risk of exploitation.

Key areas of focus for securing file uploads include:

Extension Validation: Implement a blocklist or allowlist of file extensions to ensure that only allowed file types are accepted.
File Name Sanitization: Generate random strings for file names upon upload.
Content-Type Validation: Verify the MIME-type of the uploaded file to ensure that it matches the expected format.
Image Header Validation: For image uploads, functions like getimagesize() in PHP can be used to confirm the validity of the file by checking its header.

File Upload Filter Bypass

Despite the implementation of these protective measures, attackers continually refine their methods to bypass validation mechanisms. Techniques such as null character injection, double extensions, and empty extensions can undermine extension validation: a file may appear with a name like "file.php.jpg," "file.php%00.jpg," "file.PhP," or "file.php/" to evade detection . MIME-type validation can be circumvented by modifying the file's initial magic bytes, such as changing them to GIF89a, the header associated with GIF files, which can trick the system into identifying the file as a legitimate format. Additionally, a malicious .htaccess file can be uploaded to manipulate server configurations, allowing the execution of files with unauthorized extensions.

Polyglot File Attacks

Even with the implementation of rigorous validation processes combining multiple security measures to prevent the file upload filter bypass technique, sophisticated attacks targeting polyglot files, or polyglot images, remain a significant security threat. This method enables attackers to craft files—such as images—that conform to the expected binary structure for image files, but can simultaneously execute malicious code when interpreted in a different context. The dual nature of these files allows them to bypass traditional validation mechanisms and exploit vulnerabilities in specific scenarios.

Flow diagram showing a polyglot file processed as both an image and malicious JavaScript code — Javascript/Jpeg polyglot file

Simple Polyglot File with ExifTool

A simple technique for generating a polyglot image is to utilize ExifTool. This powerful application is designed to read, write, and modify various metadata formats, such as EXIF, XMP, JFIF, and Photoshop IRB. However, malicious individuals may take advantage of ExifTool to execute harmful actions, including the creation of a polyglot image with malicious intent. By using ExifTool to embed malicious code into the EXIF metadata of an image—particularly in fields like UserComment and ImageDescription—attackers can generate a polyglot image and increase their chance for successful exploitation.

The following presents the EXIF metadata of the image, which provides comprehensive information related to it.

Screenshot showing detailed Exif metadata extracted from an image file, including file type, resolution, and encoding information — Exif metadata of an image

By utilizing ExifTool, a threat actor can embed malicious code within the EXIF metadata of an image, thereby creating a polyglot file that may circumvent validation mechanisms.

Example of harmful code added to the user comment section of an image’s Exif metadata — Inject harmful code into the User Comment section of Exif metadata

Although MIME type validation can restrict the upload of basic web shell files, this polyglot image can bypass these restrictions, allowing an attacker to upload a polyglot web shell.

Request-response showing rejection of a malicious web shell due to MIME type restrictions — The web shell has been denied because its MIME type is not permitted

Request-response example of bypassing restrictions using a polyglot image file created with Exiftool — Bypass the restriction via a polyglot image created with exiftool

The attacker can subsequently exploit the polyglot web shell to take control of the web server.

Screenshot showing an attacker gaining control of a web server through a malicious web shell — The attacker can gain control of the web server by utilizing the web shell

Javascript/JPEG Polyglot File

A JavaScript/JPEG polyglot file is structured to be valid as both a JPEG image and a JavaScript script. To achieve this, a malicious actor must have a comprehensive understanding of the internal structure of a JPEG file. This knowledge enables the accurate embedding of malicious binary data within the image, ensuring it can be processed by a JavaScript engine without affecting its validity as a JPEG image.

A JPEG Image has the following structure:

Bytes	Name
0xFF, 0xD8	Start Of Image
0xFF, 0xE0, 0x00, 0x10, …	Default Header
0XFF, 0XFE, …	Image Comment
0xFF, 0xDB, …	Quantization Table
0xFF, 0xC0, …	Start of Frame
0xFF, 0xC4, …	Huffman Table
0xFF, 0xDA, …	Start of Scan
0xFF, 0xD9	End Of Image

Visual breakdown of the JPEG file format, highlighting key segments and metadata fields — JPEG format – source: https://github.com/corkami/formats/blob/master/image/JPEGRGB_dissected.png

In a JPEG image structure, the header is followed by length information. As shown in the previous example, the header begins with the sequence 0xFF 0xE0 0x00 0x10, where 0x00 0x10 specifically represents the length of the segment, indicating 16 bytes. The marker 0xFF 0xD9 marks the end of the image.

Hexadecimal representation of an image file, showcasing its structure and encoding — Hexadecimal representations of an image

To create a JavaScript/JPEG polyglot file, it is necessary to modify the hexadecimal values of the image to ensure that the JavaScript engine can recognize and process them.

First, in JavaScript, the sequence 0xFF 0xD8 0xFF 0xE0 can be interpreted as non-ASCII values, but 0x00 0x10 is invalid and must be altered. The suitable replacement for these hex values is 0x2F 0x2A, which is the hexadecimal representation of /*, a syntax used to open a comment in JavaScript. This substitution allows the remaining binary data to be ignored as part of the comment.

However, since 0x00 0x10 originally represents the length of the JPEG header, altering it to 0x2F 0x2A, which in decimal equals 12074, requires redefining the JPEG header to maintain its validity. To achieve this, null bytes need to be added, and the JavaScript payload should be placed after the 0xFF 0xFE marker, which indicates an image comment in the JPEG structure.

For example, if the payload is */=alert(document.domain);/*, which is 28 bytes long, the required null bytes would be calculated as follows: 12074 (new length) - 16 (original header length) - 2 (for the 0xFF 0xFE marker) - 28 (payload length) = 12,028 null bytes.

Consequently, the JavaScript code within the JPEG image would resemble the following:

Example of JavaScript code embedded within non-ASCII variables for exploitation — Embed JavaScript code

Hexadecimal representation of a polyglot JPEG file

Hexadecimal values of the modified image, showing an alert comment addition — The hexadecimal value of the image following the modification

Finally, the sequence 0x2A 0x2F 0x2F 0x2F (corresponding to *///) must be placed just before the JPEG end marker 0xFF 0xD9. This step closes the JavaScript comment and ensures the payload is correctly executed without disrupting the structure of the JPEG file.

Extended view of JavaScript code within the modified hexadecimal image — Close the JavaScript comment

After this modification, the image can still be interpreted as a valid image while simultaneously containing executable JavaScript code.

Polyglot JPEG displayed as a standard image in a viewer — Polyglot image displayed as a standard image

When an HTML file loads this image as a JavaScript source code, it remains valid and can execute the embedded JavaScript code:

HTML code embedding a polyglot JPEG as a script

Browser output demonstrating polyglot image executable as JavaScript code — Polyglot image executable as JavaScript code

PHAR/JPEG Polyglot Files

Polyglot image files present risks not only for client-side exploitation but also in server-side attacks under particular circumstances. An example of this is the Phar/JPEG polyglot file, which can be interpreted both as a PHP Archive (Phar) and as a JPEG image. The Phar file structure permits the embedding of serialized data in metadata, which poses a potential risk for deserialization vulnerabilities, especially in certain PHP versions. As a result, Phar/JPEG polyglot files can be leveraged to bypass file upload validation and exploit vulnerable servers.

The Phar file format is laid out as stub/manifest/contents/signature, and stores the crucial information of what is included in the Phar archive in its manifest:

Stub: The stub is a chunk of PHP code which is executed when the file is accessed in an executable context. There are no restrictions on the stub’s contents, except for the requirement that it concludes with __HALT_COMPILER();.
Manifest: This section contains metadata about the archive and its contents, which may include serialized Phar metadata stored in serialize() format.
File contents: The original files that are included in the archive.
Signature (optional): Contains signature information for integrity verification.

Phar file format structure displaying stub, manifest, file content, and signature segments — Phar File Format

Since the stub does not impose any content restrictions beyond the stipulation of the __HALT_COMPILER(), a threat actor can inject the hexadecimal values of an image into the stub. By placing these values at the beginning of the PHAR file, it can be identified as a valid image. Consequently, a PHAR/JPEG polyglot can be easily constructed by appending the hexadecimal bytes of a JPEG image at the start, as demonstrated in the following example:

PHP script creating a polyglot Phar/JPEG file with metadata and malicious content — Generate Phar/JPEG polyglot file

Through this method, the generated polyglot file functions as both a valid image and a legitimate PHAR file and can therefore be used to bypass certain file upload validation mechanisms.

Hexadecimal values of a PHAR/JPEG polyglot file

PHP output recognizing the polyglot PHAR file as a valid image — PHP compiler recognizes it as a valid image

Although this polyglot file can bypass file upload filters, it is not currently capable of exploiting the web server. To successfully exploit and compromise a web server using a PHAR file or PHAR polyglot file, it is essential to inject malicious serialized metadata into the manifest of the file.

When the PHAR file is accessed through the PHAR wrapper (phar://) in certain PHP functions (PHP ≤7.x) associated with file operations - such as file(), file_exists(), file_get_contents(), fopen(), rename(), or unlink() - the unserialize() function is triggered for the serialized metadata. Ultimately, by utilizing PHPGGC, a widely used tool for constructing PHP gadget chains, threat actors can exploit the deserialization vulnerability via a PHAR polyglot file, thereby compromising the web application server.

The combination of PHAR/JPEG polyglot files and deserialization vulnerabilities empowers attackers to infiltrate a web application server, even when file upload filters are implemented. Notably, this compromise can occur even during the processing of an image file.

A snippet from a PHP configuration file setting phar.readonly to 0 — PHP configuration file

Example of web application code vulnerable to exploitation through improper handling of file paths — Vulnerable code in the web application

By leveraging polyglot files to bypass file upload filters and appending the PHAR wrapper (phar://) to the file location, attackers can manipulate the web server into treating the file as a PHAR archive. This manipulation can subsequently trigger a deserialization vulnerability, leading to remote code execution through file operation functions.

Remote Code Execution attack through PHAR/JPEG polyglot file demonstrates an attack using a malicious polyglot file to execute remote code — Remote Code Execution attack through PHAR/JPEG polyglot file

Simulating Real-World Attacks with PHAR/JPEG Polyglot Files

To convey the risks associated with polyglot files in your application, we simulated an environment where the application employs strict file upload filters to prevent the upload of malicious files or web shells. Despite these safeguards, a polyglot image can bypass the validation process and, in certain contexts, may lead to remote code execution, ultimately compromising the vulnerable web application server.

This example illustrates a conventional web application that enables file sharing between clients, partners, and organizations:

Protect Your Web Application with MetaDefender Core™ and MetaDefender ICAP Server™

MetaDefender Core and MetaDefender ICAP Server protects your web applications from these threats and enhance the security of your network and infrastructure.

MetaDefender ICAP Server and MetaDefender Core work together to shield your web server against sophisticated attacks involving malicious PHAR/JPEG polyglot files in the following ways:

When a PHAR/JPEG polyglot file is uploaded to the web application, it is first forwarded to MetaDefender Core through MetaDefender ICAP Server for a comprehensive sanitization process using our Deep CDR ™ technology. Unlike simple file type checkers, Deep CDR thoroughly analyzes the structure of the uploaded file, removing scripts, macros, and out-of-policy content, reconstructing the JPEG file to include only its necessary data.

This process removes harmful PHAR content appended after the JPEG end marker (0xFF 0xD9), ensuring the sanitized file is strictly a JPEG. As a result, the web application is safeguarded from PHAR/JPEG polyglot attacks; even if an attacker can alter the file processing scheme to inject a PHAR wrapper, they cannot exploit the web server.

Hexadecimal values of a sanitized file after processing by OPSWAT MetaDefender

Organizations with established network security infrastructure—whether they use WAFs (web application firewalls), proxies, or ingress controllers—can now enhance their defense mechanisms through MetaDefender ICAP Server.  This solution creates an interface between existing web servers and MetaDefender Core, establishing a transparent security checkpoint for all incoming files. Any content routed through the ICAP interface will be scanned and processed, before it reaches your web server, ensuring that only safe and legitimate content enters your network and reaches end users.

Diagram showing the integration of ICAP Server with Kubernetes-managed NGINX — MetaDefender ICAP Server integrates with NGINX

This approach means organizations can leverage their existing security investments while adding an additional, powerful layer of protection. Organizations using NGINX ingress controller can integrate MetaDefender ICAP Server with their existing infrastructure via proxy configuration.

Configuration file showing a sanitized PHAR/JPEG polyglot using Deep CDR — The malicious PHAR/JPEG polyglot file is sanitized by Deep CDR (MetaDefender Core)

OPSWAT's approach goes beyond traditional threat detection. Instead of simply flagging suspicious files, MetaDefender Core actively neutralizes potential threats, transforming dangerous files into safe, usable content. When integrated with your web server, MetaDefender ICAP Server provides comprehensive protection against zero-day threats and polyglot attacks.

Empowered by a “Trust no file. Trust no device. ™” philosophy, OPSWAT solves customers’ challenges around the world with patented technologies at every level of your infrastructure, securing your networks, data, and devices, and preventing known and unknown threats, zero-day attacks, and malware.

Talk to an Expert

Reference

https://cheatsheetseries.owasp.org/cheatsheets/File_Upload_Cheat_Sheet.html

https://www.php.net/manual/en/phar.fileformat.phar.php

https://en.wikipedia.org/wiki/JPEG

https://portswigger.net/research/bypassing-csp-using-polyglot-jpegs