AI-Powered Cyberattacks: How to Detect, Prevent & Defend Against Intelligent Threats

Read Now
We utilize artificial intelligence for site translations, and while we strive for accuracy, they may not always be 100% precise. Your understanding is appreciated.

SVG Unveiled: Understanding XXE Vulnerabilities and Defending Your Codebase

by Luan Pham, Associate Penetration Tester
Share this Post

Image files are commonly considered safe files and are often handled without awareness of potential security issues. These image files, especially SVG files, are common attack vectors to launch dangerous attacks such as Cross-Site Scripting (XSS) or XML External Entity (XXE) injection. In contrast to XSS attacks, XXE injection attacks do not affect the client side, but they could significantly affect the server side, leading to several severe impacts.  In this blog post, we will explore the nature of XXE attacks specifically via Scalable Vector Graphics (SVG) attack vector, discuss a real-world example, and provide mitigation strategies with OPSWAT products. 

Background Information

Before delving into the vulnerability, we will first examine the XML, SVG file format and the root cause behind the XXE injection technique. 

XML File Format

XML (eXtensible Markup Language) is a platform-independent file format for storing and exchanging structured data. XML supports hierarchical structure, making it ideal for representing complex data relationships. Data in XML format is organized into tags, attributes, and content, in the same way as HTML. However, XML is highly customizable and extensible, allowing users to define their tags and attributes to suit their requirements. The figure below shows the data of the IT department in XML format. 

XML code snippet displaying employee details in an IT department structure

XML Entities

In XML, entities are placeholders for data that allow you to embed text or entire documents into the current document. Syntactically, an entity in XML is enclosed by an ampersand (&) and a semi-colon (;). In the below example, two entities are defined in Document Type Definition and referred to the content of the XML file. The difference between the two entities is that the internal entity is defined and referenced within the current document, while the content of the external entity is from an external document. After parsing and resolving the entities, the entities are replaced with the corresponding data. 

Code example of an XML document with internal and external entities including a greeting
Simplified XML code example highlighting the structure and use of entities and content

SVG File Format

SVG (Scalable Vector Graphics), is a versatile file format used extensively across web development, graphic design, and data visualization. Unlike traditional image formats like JPEG or PNG, SVG uses XML format to describe two-dimensional vector graphics. Specifically, SVG images are composed of geometric shapes like lines, curves, and polygons, defined by mathematical equations rather than individual pixels. As a result, SVG graphics can be scaled infinitely without losing quality, making them ideal for responsive web design and high-resolution displays. Due to the XML nature of the SVG format, it also provides potential attack vectors for XML-related exploits. 

Sample image showing a basic SVG file structure with text content

XXE Injection Technique and Impacts

XXE injection technique abuses the external entity resolving mechanism. Specifically, when the parser meets an external entity, it would fetch the corresponding content based on the type of the resource file.  

If the resource file is local, the parser will retrieve the content of the file and replace the entity with the corresponding data. This would allow an attacker to show sensitive data such as server configuration, credentials, etc. To successfully exploit the vulnerability, the attacker declares an external entity referring to the content of sensitive files, /etc./password for instance. 

However, if the resource is remote or an internal service, the parser will try to fetch the data by requesting the defined URL. This could be exploited to perform server-side request forgery (SSRF). In this case, instead of referring to a local file, the attacker will change the payload to the URL of the vulnerable service to request on behalf of the server. 

XXE attack via SVG file mishandling: A case study 

We will investigate an XXE for a real-world case, the SVGLIB library up to version 0.9.3. The vulnerability was discovered in 2020 and assigned CVE-2020-10799. We will first examine the flow of the library, analyze the vulnerable code snippet, and finally prove the exploitation via an SVG to PNG converting service. The target is self-implemented web applications using vulnerable SVGLIB versions for the conversion. 

svglib Package

svglib is a pure Python library designed to convert SVG format to other formats such as PNG, JPG, PDF, etc. using the Report Lab Open-Source toolkit. Since SVG files use XML format, parsing and handling XML is also a relevant part of the main flow of the library. The 3 main steps in the library are as follows: 

Process flow diagram illustrating steps from parsing an SVG file to converting it to another format

Exploitation 

The vulnerability lies in the SVG file parsing process, if misconfigured, will leak the sensitive data on the server and potentially cause SSRF vulnerabilities. Further examination of the source code of the svglib package, the XXE vulnerability is caused by using the default configuration for parsing and handling XML format when loading the SVG file. The package used the lxml package, in which the default value for resolve_ entities attributes for XMLParser class is True.  

Code difference showing changes in a Python function to load an SVG file with entity resolution options
Flow diagram of an XXE attack process using SVG files in a security context

Remediation

The root cause of the problem is the insecure XML parsing, which is implemented incorrectly in the svglib library and part of the lxml dependency. As a result, using a vulnerable version of this library may result in disclosure of sensitive information, server-side request forgery, or even potential remote code execution, depending on the deployed environment and the functionality of the application. Vulnerability introduced from third-party libraries is a severe problem for the security of the large applications as their dependencies could be complex and not transparent.  

MetaDefender Software Supply Chain

OPSWAT MetaDefender Software Supply Chain provides expanded visibility and a robust defense against supply chain risks with a combination of multiple technologies. OPSWAT Software Bill of Materials (SBOM) helps gain visibility into open-source third-party software packages and identify software dependencies, vulnerabilities, or other potential risks existing under every layer of a container image. With the integration of more than 30 antivirus engines combined, the malware detection rate of the Multiscanning technology reaches more than 99.99%. Furthermore, the Proactive DLP (Data Loss Prevention) technology identifies credentials such as passwords, secrets, tokens, API keys, or other sensitive information left in source code. With our zero-trust threat detection and prevention technologies, your software development lifecycle (SDLC) is secured from malware and vulnerabilities, strengthening application security and compliance adherence.  

OPSWAT MetaDefender Software Supply Chain dashboard showing repository scan results for vulnerabilities and threats
OPSWAT MetaDefender Software Supply Chain security report for a requirements.txt file showing no threats or secrets detected

MDSSC detects the CVE found in svglib. It also flags the CVE severity level in the SBOM report and identifies the vulnerable software version.

MetaDefender Core - Deep CDR

Another reason that makes the exploit possible is that the application processes an SVG file injected with a malicious payload. As a result, if the image file is sanitized before feeding into the converting service, the payload would be eliminated therefore, preventing the attack from happening. The Deep Content Disarm and Reconstruction (CDR) technology in MetaDefender Core protects from known and unknown file-borne threats by sanitizing and reconstructing files. With support for over 160 common file types and hundreds of file reconstruction options, OPSWAT’s Deep CDR neutralizes any potential embedded threats, ensuring the sanitized file maintains full usability with safe content. 

Editable XML code in SVG format with XXE vulnerability outlined in DOCTYPE element
The Malicious SVG File
Clean version of XML code in SVG format without XXE vulnerability
The SVG file after being sanitized with MetaDefender Core – Deep CDR 
Security report showing SVG file marked as allowed and sanitized after vulnerability assessment
MetaDefender Core – Deep CDR sanitized result

Author Bio

Luan Pham participated in OPSWAT's Critical Infrastructure Cybersecurity Graduate Fellowship Program, and he is currently an Associate Penetration Tester at OPSWAT. He is passionate about safeguarding OPSWAT's products against potential threats and sharing his knowledge. 

Luan Pham
Associate Penetration Tester

Stay Up-to-Date With OPSWAT!

Sign up today to receive the latest company updates, stories, event info, and more.