Image files are commonly considered safe files and are often handled without awareness of potential security issues. These image files, especially SVG files, are common attack vectors to launch dangerous attacks such as Cross-Site Scripting (XSS) or XML External Entity (XXE) injection. In contrast to XSS attacks, XXE injection attacks do not affect the client side, but they could significantly affect the server side, leading to several severe impacts. In this blog post, we will explore the nature of XXE attacks specifically via Scalable Vector Graphics (SVG) attack vector, discuss a real-world example, and provide mitigation strategies with OPSWAT products.
Background Information
Before delving into the vulnerability, we will first examine the XML, SVG file format and the root cause behind the XXE injection technique.
XML File Format
XML (eXtensible Markup Language) is a platform-independent file format for storing and exchanging structured data. XML supports hierarchical structure, making it ideal for representing complex data relationships. Data in XML format is organized into tags, attributes, and content, in the same way as HTML. However, XML is highly customizable and extensible, allowing users to define their tags and attributes to suit their requirements. The figure below shows the data of the IT department in XML format.
XML Entities
In XML, entities are placeholders for data that allow you to embed text or entire documents into the current document. Syntactically, an entity in XML is enclosed by an ampersand (&) and a semi-colon (;). In the below example, two entities are defined in Document Type Definition and referred to the content of the XML file. The difference between the two entities is that the internal entity is defined and referenced within the current document, while the content of the external entity is from an external document. After parsing and resolving the entities, the entities are replaced with the corresponding data.
SVG File Format
SVG (Scalable Vector Graphics), is a versatile file format used extensively across web development, graphic design, and data visualization. Unlike traditional image formats like JPEG or PNG, SVG uses XML format to describe two-dimensional vector graphics. Specifically, SVG images are composed of geometric shapes like lines, curves, and polygons, defined by mathematical equations rather than individual pixels. As a result, SVG graphics can be scaled infinitely without losing quality, making them ideal for responsive web design and high-resolution displays. Due to the XML nature of the SVG format, it also provides potential attack vectors for XML-related exploits.
XXE Injection Technique and Impacts
XXE injection technique abuses the external entity resolving mechanism. Specifically, when the parser meets an external entity, it would fetch the corresponding content based on the type of the resource file.
If the resource file is local, the parser will retrieve the content of the file and replace the entity with the corresponding data. This would allow an attacker to show sensitive data such as server configuration, credentials, etc. To successfully exploit the vulnerability, the attacker declares an external entity referring to the content of sensitive files, /etc./password for instance.
However, if the resource is remote or an internal service, the parser will try to fetch the data by requesting the defined URL. This could be exploited to perform server-side request forgery (SSRF). In this case, instead of referring to a local file, the attacker will change the payload to the URL of the vulnerable service to request on behalf of the server.
XXE attack via SVG file mishandling: A case study
We will investigate an XXE for a real-world case, the SVGLIB library up to version 0.9.3. The vulnerability was discovered in 2020 and assigned CVE-2020-10799. We will first examine the flow of the library, analyze the vulnerable code snippet, and finally prove the exploitation via an SVG to PNG converting service. The target is self-implemented web applications using vulnerable SVGLIB versions for the conversion.
svglib Package
svglib is a pure Python library designed to convert SVG format to other formats such as PNG, JPG, PDF, etc. using the Report Lab Open-Source toolkit. Since SVG files use XML format, parsing and handling XML is also a relevant part of the main flow of the library. The 3 main steps in the library are as follows:
Exploitation
The vulnerability lies in the SVG file parsing process, if misconfigured, will leak the sensitive data on the server and potentially cause SSRF vulnerabilities. Further examination of the source code of the svglib package, the XXE vulnerability is caused by using the default configuration for parsing and handling XML format when loading the SVG file. The package used the lxml package, in which the default value for resolve_ entities attributes for XMLParser class is True.
Remediation
The root cause of the problem is the insecure XML parsing, which is implemented incorrectly in the svglib library and part of the lxml dependency. As a result, using a vulnerable version of this library may result in disclosure of sensitive information, server-side request forgery, or even potential remote code execution, depending on the deployed environment and the functionality of the application. Vulnerability introduced from third-party libraries is a severe problem for the security of the large applications as their dependencies could be complex and not transparent.
MetaDefender Software Supply Chain
OPSWAT MetaDefender Software Supply Chain provides expanded visibility and a robust defense against supply chain risks with a combination of multiple technologies. OPSWAT Software Bill of Materials (SBOM) helps gain visibility into open-source third-party software packages and identify software dependencies, vulnerabilities, or other potential risks existing under every layer of a container image. With the integration of more than 30 antivirus engines combined, the malware detection rate of the Multiscanning technology reaches more than 99.99%. Furthermore, the Proactive DLP (Data Loss Prevention) technology identifies credentials such as passwords, secrets, tokens, API keys, or other sensitive information left in source code. With our zero-trust threat detection and prevention technologies, your software development lifecycle (SDLC) is secured from malware and vulnerabilities, strengthening application security and compliance adherence.
MDSSC detects the CVE found in svglib. It also flags the CVE severity level in the SBOM report and identifies the vulnerable software version.
MetaDefender Core - Deep CDR
Another reason that makes the exploit possible is that the application processes an SVG file injected with a malicious payload. As a result, if the image file is sanitized before feeding into the converting service, the payload would be eliminated therefore, preventing the attack from happening. The Deep Content Disarm and Reconstruction (CDR) technology in MetaDefender Core protects from known and unknown file-borne threats by sanitizing and reconstructing files. With support for over 160 common file types and hundreds of file reconstruction options, OPSWAT’s Deep CDR neutralizes any potential embedded threats, ensuring the sanitized file maintains full usability with safe content.

Luan Pham participated in OPSWAT's Critical Infrastructure Cybersecurity Graduate Fellowship Program, and he is currently an Associate Penetration Tester at OPSWAT. He is passionate about safeguarding OPSWAT's products against potential threats and sharing his knowledge.