Using Data Sanitization to Eliminate Malware Embedded in Documents

We have previously explored how Metascan can be used to eliminate threats embedded in image files. Another potential threat vector involving embedded malware is document file types that allow for embedded objects within their content. Because functionality has been added to document editors to allow documents to become more dynamic, there are now more potentially vulnerable areas for malware developers to target with their attacks. Although document editor software providers release patches to their products when vulnerabilities are identified, many end users do not update these patches as often as for those products that seem to be more 'obvious' security threats, such as operating systems and web browsers. This can make documents a dangerous attack vector, even if the malware embedded within is exploiting vulnerabilities that were identified and fixed long ago.

Just as we discussed in our post on malware in image files, document files can be converted to other formats while retaining all information that is valuable to a human user, while eliminating any embedded malware. Since the content of the document is retained for human consumption, this is a good final precaution against zero-day attacks—even viruses that are not yet identified by antivirus engines are stripped out of document files without sacrificing the document's content.

We randomly selected a set of 500 Word documents submitted through Metascan Online that were detected as being a threat. For each of these files, we used the file type conversion functionality that exists in both MetaDefender and Metascan to convert these documents into PDF files. We found that this was successful at removing the embedded malware in 100% of the files tested. If you would like to view a the scan results before and after file sanitization was employed, three samples from our test are included below:

Threat detectedNumber of engines detectedAfter sanitization

Using data sanitization, whether through file type conversion or otherwise, to strip out embedded objects in document files is a good final step in an organization's data security workflow after all files identified as malware have already been blocked. This step provides an extra level of insurance against zero day attacks without reducing the value of the files that were converted to the people who need to use them.

Sign up for Blog updates
Get information and insight from the leaders in advanced threat prevention.