Posted by Taeil Goh / August 14, 2017
"The flexibility of XML has resulted in its widespread usage, including within Microsoft Office documents and SOAP messages. However, XML documents have many security vulnerabilities that can be targeted for different types of attacks, such as file retrieval, server side request forgery, port scanning, or brute force attacks."
This blog post is for the technical reader who would like to see more details about XML-based attacks with some examples.
We will cover the following threats in this article. OPSWAT Metadefender data sanitization (CDR) addresses all of these threats.
- XML injection
- XSS/CDATA Injection
- Oversized payloads or XML bombs
- Recursive payloads
- VBA macros
XML injection can be exploited to deliver attacks targeting XML applications that do not escape reserved characters.
In an XML document, "<" and ">" are reserved characters used to specify the beginning or the end of an XML tag. If one wants to use these reserved characters, one "escapes" their predefined meaning by using XML entities.
For example, suppose we want an element with the following content: "If the size of a stream < 4096, this sector is considered as a mini stream." We then have to escape "<" to be "<". "<" is an XML entity; it will be changed to "<" automatically by an XML parser.
XML injection attacks typically occur in this way:
- The XML document is then parsed by an XML application. In this step, the attacker targets XML applications that do not serialize properly reserved characters. This means that reserved characters are not escaped.
CDATA injection occurs in a scenario similar to XML injection.
- Later, the XML document is parsed by an XML application. For XML applications that do not serialize properly reserved characters, reserved characters are not escaped.
XML bomb attacks are designed to exhaust the resources of a web server. When processing an XML document injected with an XML bomb, the XML parser requests very high amounts of computation power to parse the document. XML bombs are well known as XML "billion laugh attacks."
An XML bomb attack is made possible by exploiting XML entities. There are both predefined XML entities and user-defined XML entities. A user can define an XML entity as follows:
In the above entity definition, "name" is the entity name and "replacement text" is its value. The entity value is then inserted into an element content or attribute as "&name;".
Another entity can be used as an entity value – and this opens the door for an attacker to convey an XML bomb attack.
Let's take a look at the following XML sample.
When completely parsed, the content of "Example" element would contain 2,127 words "ha," which is about 3.4 x 1,026 terabytes. It is impossible to parse this document. As a result, the XML parser will crash.
Solution: To deal with XML bomb attacks, the XML parser is configured to limit expansion of user-defined XML entities. When the expansion exceeds a certain depth level, the parser will raise an exception.
Visual Basic Macro
XML is a well-known format not only for saving text but also for use by Microsoft Office applications. Attackers can utilize Microsoft Office XML files to hide malicious macros. This method gives an attack a greater chance of success because many users will expect XML files to be harmless text files.
When a Microsoft Word document is converted to XML format, Visual Basic for Applications (VBA) macros are compressed and encoded in base64. In a Windows machine with Microsoft Office software pre-installed, Word documents saved in XML formats are recognized.
When the XML file is double-clicked, Microsoft Word opens automatically and may run embedded VBA macros.
Attackers carry out VBA macro attacks with XML in the following way:
- Create a Microsoft Word document and add a malicious VBA macro
- Convert the document to XML format
- Send the XML document to victims, for example by email
- The victim clicks on the XML file, then Microsoft opens the XML file and runs the VBA macro
- The VBA macro downloads another harmful program and executes it
How to Detect and Remove VBA Macros
The following elements and attributes in Microsoft Word XML files help identify VBA macros.
- <?mso-application progid="Word.Document"?> tells us that the XML document is a Microsoft Word document
- <w:wordDocument ... w:macrosPresent="Yes" ...> shows that the document contains a VBA macro
- <w:binData w:name="editdata.mso"></w:binData> is where VBA macro content is stored in the document
Solution: By checking the above elements and attributes, we can remove VBA macros from an XML document.
XML Data Sanitization Demos
Below are links to Metadefender.com scanning results for the example files that we created for each of these attacks, along with scanning results for the sanitized versions of those files. (Since our sample XML documents contain examples of the exploits but do not actually perform any malicious actions, Metadefender.com engines do not detect them as malicious.)
You can also download a ZIP archive containing the original and sanitized files by clicking here.
|1||VBA macro sample||Sanitized VBA macro sample||XML containing VBA macro and sanitized result|
|2||XML bomb||Sanitized XML bomb||XML bomb and sanitized result|
|4||XML injection sample||Sanitized XML injection sample||XML injection and sanitized result|
|5||CDATA injection sample||Sanitized CDATA injection sample||CDATA injection and sanitized result|
How to Utilize Metadefender Data Sanitization with XML Documents
- OWASP, "XML Security Cheat Sheet"
- Costello, Roger L. The MITRE Corporation, "XML Risks and Mitigations"
Research and content development assistance provided by OPSWAT data sanitization team.