Data Sanitization (CDR), Part 3: Active Content Curing

After discussing content disarm and reconstruction (CDR) in general in part one of this series and file structure alterations in part two, I am now ready to show you yet another technology that helps provide protection against unknown malware.

This technology is referred to as content curing. In general, this is not considered to be a new technology. In the late 1990s and early 2000s when macro-based viruses first started to plague the security industry, many antivirus vendors implemented a sort of 'clean up all macros from a document even it is not infected' technology into their products to try to provide the ultimate solution for tackling macro viruses. This clean-up effort was one of the initial trials of the content curing method.

Active Content Curing Process

Content curing works by treating all active content as a possible attack vector and removes it without any further consideration. Active content can include macros in Microsoft Word, formulas or macros in Excel, JavaScript and Adobe Acrobat documents, or any script or similar content located within a document. Document formats can store other types of embedded files either separate or jointly with the script. These embedded files can include fonts or even another type of document like an image or an executable file.

Content curing can be performed in two different ways: one involves physically removing the unwanted content from the file while the other involves deactivating the active content and converting it into meaningless data inside the sanitized file.

In order to process different file formats and to effectively remove active content from files, the data sanitization solution needs to be able to interpret the internal format of the files. There is no generic way to cure active content from every type of file.

Active Content Curing Strengths

The advantages of this method and file structure alteration are very similar in that they don't spend any time analyzing the active content to check for potential threats. This method is usually considered to be quite fast because both the physical removal of specific part and the deactivation does not require much in terms of processing power. There are many file types which can be easily processed while other file types require a deeper round of processing, including parsing the entire file.

While this method removes active content which is not put into the document intentionally, there is still a risk of the user of the document not recognizing any changes that were made. Most of the time, the average user is not familiar with macros and other advanced features that are available in different types of documents. In general, this method keeps the usability of the documents as is.

Active Content Curing Weaknesses

There are two big issues with the content curing method. By default, this method removes all active content from the file, including useful macros and embedded objects. Content curing can also be used to determine the useful objects, leaving the document as is. Even still, it can be hard to identify which objects are safe versus which are not. The other issue with this method is that it won't remove any zero-day exploits which are utilizing the weaknesses of the application that is designed to load the file. These exploits usually require modification to the file format to make it invalid and force the application to work improperly.

The latter of the two weakness mentioned above cannot be mitigated by the content curing method, so it is suggested to combined this method with other data sanitization techniques in order to effectively remove not only harmful objects but zero-day exploits from the files.

Stay tuned for part four of our series, in which we'll take a closer look at file type conversion!

Sign up for Blog updates
Get information and insight from the leaders in advanced threat prevention.