The rapid rise of technology has created a high demand for skilled developers. Open-source software (OSS) has become a vital tool for this growing workforce. Hundreds of thousands of well-established OSS packages now exist across various programming languages. Over 90% of developers leverage these open-source components within their proprietary applications, highlighting the efficiency and value proposition of OSS. Further emphasizing its importance, the global open-source software market is expected to reach $80.7 billion by 2030, reflecting a projected growth rate of 16.7% annually.
Software is integral to business operations around the world and is therefore frequently targeted by threat actors. In 2023, spending on Application Security was approximately $5.76 billion, with projections reaching $6.67 billion in 2024. Within AppSec, software supply chain security has gained prominence over the past three years, representing the fastest-growing attack category, with major breaches and exploits making headlines regularly.
Organizations often assume that most risks originate from public-facing web applications. However, this perspective has shifted in recent years. With dozens of small components in every application, risks can now emerge from anywhere within the codebase. It is more critical than ever for organizations to familiarize themselves with existing and emerging security flaws in the software development lifecycle. In this blog post, our Graduate Fellows provide a closer look at CVE-2023-23924, a critical security flaw discovered in the widely-used dompdf library—a powerful PHP tool that enables the dynamic generation of PDF documents from HTML and CSS.
Through a comprehensive analysis, we will explore the technical intricacies of this vulnerability, the related technologies that play a crucial role, and a simulation of the exploitation process. We'll also examine how OPSWAT MetaDefender Core, particularly its Software Bill of Materials (SBOM) engine, can be leveraged to detect and mitigate this vulnerability, empowering software developers and security teams to stay one step ahead of potential threats.
Background on CVE-2023-23924
A security vulnerability was discovered in dompdf version 2.0.1 and made public at the beginning of 2023:
- The URI validation on dompdf 2.0.1 can be bypassed on SVG parsing by passing
<image>
tags with uppercase letters. This allows an attacker to call arbitrary URLs with arbitrary protocols, leading to arbitrary object unserialization in PHP versions before 8.0.0. Through the PHAR URL wrapper, this vulnerability can cause arbitrary file deletion and even remote code execution, depending on available classes. - NVD Analysts assigned a CVSS score of 9.8 CRITICAL to CVE-2023-23924.
Understanding the Dompdf Vulnerability
To fully understand the CVE-2023-23924 vulnerability in dompdf, it's essential to familiarize ourselves with two closely related technologies: Scalable Vector Graphics (SVG) and PHAR files.
SVG (Scalable Vector Graphics) is a versatile image format that has gained widespread popularity due to its ability to render high-quality graphics on the web while remaining lightweight and scalable. Unlike raster images, SVGs are based on XML markup, allowing for precise control over elements such as lines, shapes, and text. One of the key advantages of SVGs is their ability to scale seamlessly without losing image quality, making them ideal for responsive web design and high-resolution displays.
PHAR (PHP Archive) is analogous to the JAR file concept but for PHP. It allows easy deployment by bundling all PHP code and resources file into a single archive file.
A PHAR file consists of 4 sections:
- Stubs: contains the code to bootstrap the archive.
- Manifest: contains the metadata of the archive. The metadata is stored in serialized format, which could be used to trigger PHP deserialized attack in a malicious PHAR file.
- File content: contains the content of the archive, including PHP code, resource files.
- Signature (Optional): contains data to verify the file integrity.
Due to the metadata being stored in serialized format, the combination of the PHAR wrapper and the metadata of the PHAR file could potentially trigger a deserialization attack when used as input in PHP functions such as file_get_contents(), fopen(), file(), file_exists(), md5_file(), filemtime(),
or filesize()
. This security oversight could enable attackers to execute remote code via a PHAR file.
How Dompdf Generates a PDF File
Through analysis, OPSWAT Graduate Fellows identify that there are three stages during the converting process on dompdf. To convert an HTML file to PDF format, the dompdf library first parses the input file into a DOM tree and stores the positioning and layout information of each object. Next, the CSS styling is parsed and applied to each object. Finally, the objects are reorganized to fit on the page and rendered into the final PDF file.
To enhance security, dompdf implements validation to ensure the safety of URI inputs before proceeding to the next step. This validation process is evident during the processing of SVG file within the value of xlink:href
attribute.
If the SVG input file contains the <image>
tag under the <svg>
tag, a condition is implemented to only allows certain protocols, such as http://
, https://
and file://
, for the xlink:href
field.
The resolve_url()
function will validate the URI before processing it with the drawImage()
function. If the scheme in the URI is not among the allowed protocols, the resolve_url()
function will return an exception to the application.
If the URI passes the validation, it will then be passed to the drawImage()
function, which uses the file_get_contents()
function to handle the URI value within the xlink:href
attribute. The security vulnerability arises at this point: a PHAR deserialization attack could be triggered if an attacker could bypass the validation and pass a PHAR wrapper into the URI.
After the analysis, it is discovered that the validation is only enforced on tags with the exact name image.
Consequently, it can be easily bypassed by capitalizing one or more characters in the name of the tag name, such as Image. As a result, a specially crafted SVG file utilizing Image instead of image could evade this validation.
In the subsequent step, the drawImage()
function is invoked, passing the URI from the SVG file to the file_get_contents()
function. Therefore, the combination of the validation bypass and a deserialization attack on a PHAR file allows an attacker to conduct Remote Code Execution. This vulnerability opens the door for a malicious attacker to compromise the application server through an SVG file if it is handled by a vulnerable version of dompdf.
Dompdf Exploitation simulation
To simulate this exploitation as a real-world scenario, OPSWAT Graduate Fellows developed a web application featuring HTML to PDF conversion using dompdf library version 2.0.1. This application allows users to upload file types such as HTML, XML, or SVG, and then convert them to PDF files.
An attacker will follow these steps to exploit this vulnerability in an application using a vulnerable dompdf version (version 2.0.1):
Firstly, an attacker generates a malicious object that creates a reverse shell when the destructor is called.
Secondly, the attacker creates a PHAR file containing the malicious metadata object. When the PHAR file is invoked by the PHAR wrapper in the file_get_content()
function, the metadata is unserialized and the specified object is processed. During this deserialization process, it triggers the destructors and executes the reverse shell.
Ultimately, the attacker embeds the URI link containing the PHAR wrapper into the value of xlink:href
attribute in the Image tag to bypass validation, execute malicious code, and upload this malicious file to the application.
During the processing of the uploaded malicious SVG file, the application establishes a reverse connection to the attacker, enabling them to compromise the application server.
Securing Open-source Components with OPSWAT MetaDefender Core
To detect and mitigate the CVE-2023-23924 vulnerability in dompdf, our Graduate Fellows utilized OPSWAT MetaDefender Core, a multilayered cybersecurity product offering advanced malware prevention and detection technologies, including the SBOM.
OPSWAT SBOM secures the software supply chain by providing a comprehensive component inventory for source code and containers. By analyzing the dompdf library and its dependencies, OPSWAT SBOM can quickly identify the presence of the vulnerable version 2.0.1 and alert users to the potential risk.
Additionally, OPSWAT SBOM can detect related technologies, such as SVG and PHAR files, which are essential for understanding and mitigating the CVE-2023-23924 vulnerability. This holistic view of the application's software components empowers security teams to make informed decisions and take appropriate actions to address identified risks.
Beyond detecting the vulnerable dompdf version, OPSWAT SBOM also provides valuable insights into affected components, their versions, and any available updates or patches. This information allows security teams to prioritize their remediation efforts and ensure the application is updated to a secure version of dompdf, effectively addressing the CVE-2023-23924 vulnerability.
By leveraging the SBOM engine within MetaDefender Core, organizations can proactively monitor their software supply chain, identify potential vulnerabilities in open-source components, and implement timely mitigation strategies, ensuring the overall security and integrity of their applications.
Closing Thoughts
The discovery of CVE-2023-23924 in the dompdf library underscores the critical need for vigilance in the dynamic landscape of application security. By leveraging the insights and strategies outlined in this blog post, security teams can effectively detect, mitigate, and safeguard their applications against such vulnerabilities, ensuring the overall security and integrity of their software ecosystem.