Type to search
Analyze a file Free Tools

Data Sanitization (CDR), Part 2: File Structure Alterations

‹ Blog

Data Sanitization (CDR), Part 2: File Structure Alterations

As mentioned in the introductory post in this series, topic for discussion is content disarm and reconstruction (CDR), or data sanitization, via file structure alteration. In order to take advantage of the software vulnerabilities we discussed in our last post, malware embedded in documents is designed to be completely invisible to the end user. They could open a PDF, for instance, and be completely unaware that a script is running in the background, leveraging a vulnerable Flash installation to infect their computer. 

File Structure Alteration Process

To understand this method of data sanitization, it might be helpful to picture an adult driving a car with an infant passenger. If the adult switches places with the infant, the car would still look the same, but it wouldn’t be able to operate as intended. Similarly, file structure alterations make small changes to the scripts included in a file—changing the order of lines, for instance, rendering the scripts inert without making changes to the visible content of the document. Small changes can also be applied to the metadata and objects within a file, with a similar effect. Another technique is to validate the structure of a document, for instance looking for elements that aren’t typically included in a Word file and removing any part of the file that is outside the expected parameters. 


"broken structure" by Scott Swigart
Used under CC by OPSWAT / Quote added to original

File Structure Alteration Strengths

One of the strengths of this type of sanitization is that it doesn’t waste time analyzing scripts and other document features to determine if they are malicious or safe; that analysis falls apart in the face of zero-day attacks and unknown threats. By treating every document with embedded objects as suspect, file structure alterations can be an effective method for rendering document-borne threats inert. Depending on the implementation, it also generally protects the formatting of a given file—allowing the original style of the document to remain while disarming potential threats.

File Structure Alteration Weaknesses

That said, there could be a negative impact to the usability of a file sanitized using this method. Embedded objects and scripts aren’t always threats, and by treating them as such it can render documents unusable to the recipients. There’s also a risk that the small structural changes aren’t enough to disable the exploit. If we return to the trusty analogy of our car, if there are two adult passengers and one infant, swapping the order of the adults will not be enough to keep the car from being turned on!  Similarly, if the script isn’t changed enough or the wrong section of the document is validated, the exploit could still be active.

Most, if not all, of these weaknesses can be addressed by utilizing multi-phase file structure alterations, in which these techniques are used together to ensure that no embedded threats sneak through.

The third edition of this series is now available, and discusses the strengths and weaknesses of active content curing.

Szilard Stange
Vice President of Product Management

Szilard Stange joined OPSWAT in 2014 to lead Product Management for our next generation Metadefender product and now manages the entire Metadefender product family. Prior to joining OPSWAT, Szilard held many engineering and product management positions in the IT security industry and helped create many anti-malware products, next generation firewalls and security monitoring products at BalaBit and VirusBuster. Szilard holds a Master's degree from the University of Pannonia.

data sanitization Content Disarm & Reconstruction Advanced Threat Prevention