AI-Powered Cyberattacks: How to Detect, Prevent & Defend Against Intelligent Threats

Read Now
We utilize artificial intelligence for site translations, and while we strive for accuracy, they may not always be 100% precise. Your understanding is appreciated.

Revealing and Remediating a Dompdf Library Vulnerability with OPSWAT MetaDefender Core 

by OPSWAT
Share this Post
Tai Tran and Hai Dang Bui, students from University of Information Technology, posing for a professional portrait against a blue background
Students participated in OPSWAT Fellowship Program

The rapid rise of technology has created a high demand for skilled developers. Open-source software (OSS) has become a vital tool for this growing workforce. Hundreds of thousands of well-established OSS packages now exist across various programming languages. Over 90% of developers leverage these open-source components within their proprietary applications, highlighting the efficiency and value proposition of OSS. Further emphasizing its importance, the global open-source software market is expected to reach $80.7 billion by 2030, reflecting a projected growth rate of 16.7% annually. 

Software is integral to business operations around the world and is therefore frequently targeted by threat actors. In 2023, spending on Application Security was approximately $5.76 billion, with projections reaching $6.67 billion in 2024. Within AppSec, software supply chain security has gained prominence over the past three years, representing the fastest-growing attack category, with major breaches and exploits making headlines regularly. 

Organizations often assume that most risks originate from public-facing web applications. However, this perspective has shifted in recent years. With dozens of small components in every application, risks can now emerge from anywhere within the codebase. It is more critical than ever for organizations to familiarize themselves with existing and emerging security flaws in the software development lifecycle. In this blog post, our Graduate Fellows provide a closer look at CVE-2023-23924, a critical security flaw discovered in the widely-used dompdf library—a powerful PHP tool that enables the dynamic generation of PDF documents from HTML and CSS. 

Through a comprehensive analysis, we will explore the technical intricacies of this vulnerability, the related technologies that play a crucial role, and a simulation of the exploitation process. We'll also examine how OPSWAT MetaDefender Core, particularly its Software Bill of Materials (SBOM) engine, can be leveraged to detect and mitigate this vulnerability, empowering software developers and security teams to stay one step ahead of potential threats. 

Background on CVE-2023-23924 

A security vulnerability was discovered in dompdf version 2.0.1 and made public at the beginning of 2023:

  • The URI validation on dompdf 2.0.1 can be bypassed on SVG parsing by passing <image> tags with uppercase letters. This allows an attacker to call arbitrary URLs with arbitrary protocols, leading to arbitrary object unserialization in PHP versions before 8.0.0. Through the PHAR URL wrapper, this vulnerability can cause arbitrary file deletion and even remote code execution, depending on available classes. 
  • NVD Analysts assigned a CVSS score of 9.8 CRITICAL to CVE-2023-23924. 
Informative diagram showing metrics for CVSS Version 3.x including severity and vector strings for security vulnerabilities

Understanding the Dompdf Vulnerability 

To fully understand the CVE-2023-23924 vulnerability in dompdf, it's essential to familiarize ourselves with two closely related technologies: Scalable Vector Graphics (SVG) and PHAR files. 

SVG (Scalable Vector Graphics) 

SVG (Scalable Vector Graphics) is a versatile image format that has gained widespread popularity due to its ability to render high-quality graphics on the web while remaining lightweight and scalable. Unlike raster images, SVGs are based on XML markup, allowing for precise control over elements such as lines, shapes, and text. One of the key advantages of SVGs is their ability to scale seamlessly without losing image quality, making them ideal for responsive web design and high-resolution displays.  

Code snippet displaying SVG with multiple polygons in different colors, illustrating programming in XML format
PHAR file 

PHAR (PHP Archive) is analogous to the JAR file concept but for PHP. It allows easy deployment by bundling all PHP code and resources file into a single archive file.  

A PHAR file consists of 4 sections:  

  • Stubs: contains the code to bootstrap the archive. 
  • Manifest: contains the metadata of the archive. The metadata is stored in serialized format, which could be used to trigger PHP deserialized attack in a malicious PHAR file. 
  • File content: contains the content of the archive, including PHP code, resource files. 
  • Signature (Optional): contains data to verify the file integrity. 

Due to the metadata being stored in serialized format, the combination of the PHAR wrapper and the metadata of the PHAR file could potentially trigger a deserialization attack when used as input in PHP functions such as file_get_contents(), fopen(), file(), file_exists(), md5_file(), filemtime(), or filesize(). This security oversight could enable attackers to execute remote code via a PHAR file. 

Table detailing the basic file format of a Phar archive manifest, including size in bytes and descriptions of each part

How Dompdf Generates a PDF File

Flowchart illustrating the process of converting an HTML file into a PDF format, highlighting intermediate steps and file formats

Through analysis, OPSWAT Graduate Fellows identify that there are three stages during the converting process on dompdf. To convert an HTML file to PDF format, the dompdf library first parses the input file into a DOM tree and stores the positioning and layout information of each object. Next, the CSS styling is parsed and applied to each object. Finally, the objects are reorganized to fit on the page and rendered into the final PDF file.  

Security Vulnerability in Dompdf

To enhance security, dompdf implements validation to ensure the safety of URI inputs before proceeding to the next step. This validation process is evident during the processing of SVG file within the value of xlink:href attribute. 

PHP code example for handling SVG files, showing functions for parsing and validating SVG content

If the SVG input file contains the <image> tag under the <svg> tag, a condition is implemented to only allows certain protocols, such as http://, https:// and file://, for the xlink:href field.

Web interface for a PDF Converter service allowing users to upload HTML or SVG files for conversion to PDF

The resolve_url() function will validate the URI before processing it with the drawImage() function. If the scheme in the URI is not among the allowed protocols, the resolve_url() function will return an exception to the application. 

PHP script showing how to handle data URIs, illustrating error handling and protocol verification in web applications

If the URI passes the validation, it will then be passed to the drawImage() function, which uses the file_get_contents() function to handle the URI value within the xlink:href attribute. The security vulnerability arises at this point: a PHAR deserialization attack could be triggered if an attacker could bypass the validation and pass a PHAR wrapper into the URI. 

After the analysis, it is discovered that the validation is only enforced on tags with the exact name image. 

PHP example of image manipulation, detailing functions for drawing images from data URLs and handling file contents

Consequently, it can be easily bypassed by capitalizing one or more characters in the name of the tag name, such as Image. As a result, a specially crafted SVG file utilizing Image instead of image could evade this validation. 

In the subsequent step, the drawImage() function is invoked, passing the URI from the SVG file to the file_get_contents() function. Therefore, the combination of the validation bypass and a deserialization attack on a PHAR file allows an attacker to conduct Remote Code Execution. This vulnerability opens the door for a malicious attacker to compromise the application server through an SVG file if it is handled by a vulnerable version of dompdf. 

Dompdf Exploitation simulation

Vulnerable Application

To simulate this exploitation as a real-world scenario, OPSWAT Graduate Fellows developed a web application featuring HTML to PDF conversion using dompdf library version 2.0.1. This application allows users to upload file types such as HTML, XML, or SVG, and then convert them to PDF files. 

Screenshot of a web-based PDF Converter tool featuring a humanoid avatar, emphasizing the file upload and conversion functionality

An attacker will follow these steps to exploit this vulnerability in an application using a vulnerable dompdf version (version 2.0.1): 

PHP code snippet displaying the use of Dompdf library for converting HTML to a PDF file in a landscape format
Composer.json file snippet specifying the requirement for the Dompdf library version 2.0.1 for a PHP project
Exploitation Flow
A graphical diagram explaining the four-step process to exploit SVG files using Dompdf vulnerability to execute a reverse shell

Firstly, an attacker generates a malicious object that creates a reverse shell when the destructor is called. 

PHP code example illustrating object deserialization vulnerability in PHP that leads to remote code execution via a reverse shell

Secondly, the attacker creates a PHAR file containing the malicious metadata object. When the PHAR file is invoked by the PHAR wrapper in the file_get_content() function, the metadata is unserialized and the specified object is processed. During this deserialization process, it triggers the destructors and executes the reverse shell. 

PHP script demonstrating the creation of a Phar archive that includes serialized object data for executing a remote command

Ultimately, the attacker embeds the URI link containing the PHAR wrapper into the value of xlink:href attribute in the Image tag to bypass validation, execute malicious code, and upload this malicious file to the application. 

SVG code snippet demonstrating how to embed a remote PHP deserialization exploit through an image reference

During the processing of the uploaded malicious SVG file, the application establishes a reverse connection to the attacker, enabling them to compromise the application server. 

Web interface of a PDF converter showing options to upload and convert HTML/SVG files to PDF, featuring a digital human model
Screenshot of a Metasploit console showing a successful reverse TCP connection and network configuration details
Terminal screenshot displaying network configuration and credentials retrieved from a Unix system

Securing Open-source Components with OPSWAT MetaDefender Core 

To detect and mitigate the CVE-2023-23924 vulnerability in dompdf, our Graduate Fellows utilized OPSWAT MetaDefender Core, a multilayered cybersecurity product offering advanced malware prevention and detection technologies, including the SBOM. 

OPSWAT SBOM secures the software supply chain by providing a comprehensive component inventory for source code and containers. By analyzing the dompdf library and its dependencies, OPSWAT SBOM can quickly identify the presence of the vulnerable version 2.0.1 and alert users to the potential risk. 

Additionally, OPSWAT SBOM can detect related technologies, such as SVG and PHAR files, which are essential for understanding and mitigating the CVE-2023-23924 vulnerability. This holistic view of the application's software components empowers security teams to make informed decisions and take appropriate actions to address identified risks. 

Security analysis interface showing a blocked JavaScript file due to vulnerabilities with detailed assessment results

Beyond detecting the vulnerable dompdf version, OPSWAT SBOM also provides valuable insights into affected components, their versions, and any available updates or patches. This information allows security teams to prioritize their remediation efforts and ensure the application is updated to a secure version of dompdf, effectively addressing the CVE-2023-23924 vulnerability. 

Detailed view of vulnerabilities in a JavaScript object notation file, listing critical and high-security risks with CVE identifiers

By leveraging the SBOM engine within MetaDefender Core, organizations can proactively monitor their software supply chain, identify potential vulnerabilities in open-source components, and implement timely mitigation strategies, ensuring the overall security and integrity of their applications. 

Closing Thoughts 

The discovery of CVE-2023-23924 in the dompdf library underscores the critical need for vigilance in the dynamic landscape of application security. By leveraging the insights and strategies outlined in this blog post, security teams can effectively detect, mitigate, and safeguard their applications against such vulnerabilities, ensuring the overall security and integrity of their software ecosystem. 

Stay Up-to-Date With OPSWAT!

Sign up today to receive the latest company updates, stories, event info, and more.