Posted by Curtis Cade / July 13, 2015
Most people who work in the anti-malware industry are familiar with signature-based detection, where if a file is determined to be malicious, a signature is written so anti-malware programs are able to detect that file or component in the future. The threat landscape is challenging for signature-based detection with an ever-increasing number of threats and the shortened duration time for the effectiveness of a single signature variation.
Because of these difficulties, complements to signature-based detection, such as heuristic-based scanning, sandboxing and/or multi-scanning (scanning for threats with multiple anti-malware engines) are needed to more effectively address modern risks. In this post, we look at the pros and cons of both heuristic-based scanning, which is used alongside signature-based detection in multi-scanning solutions to increase detection rates, and sandboxing.
Introduction to Heuristic-based Scanning
As opposed to signature-based scanning, which looks to match signatures found in files with that of a database of known malware, heuristic scanning uses rules and/or algorithms to look for commands which may indicate malicious intent. By using this method, some heuristic scanning methods are able to detect malware without needing a signature. This is why most antivirus programs use both signature and heuristic-based methods in combination, in order to catch any malware that may try to evade detection.
Benefits of Heuristic Scanning
- Heuristic scanning is usually much faster than sandboxing because it does not execute the file and then wait to record its behavior, with the exception of some emulation-based techniques .
- Vendors can change the rules in their heuristic engines with their daily update packages based on new threat vectors without the details being known to malicious actors.
- Does not give away details on how malware is flagged (unlike sandboxing), so malware authors will not be aware of what they need to change in order to evade detection.
- Heuristic scanning is able to detect malware that can evade sandbox detection through blind spots targeted by malware authors.
Limitations of Heuristic Scanning
- When scanning a sample, the information found is generally limited to the threat name.
- Because the engines are looking for specific pieces of code which indicate a malicious action, it can lead to two possible limitations:
- If the vendor has not built detection for a particular action, then the malware will evade detection.
- If the malicious action is obfuscated successfully (e.g. within an encrypted file), it will evade detection.
- Some of the older methods of heuristic-based scanning have a higher propensity for reporting false positives because they are looking for a wide range of actions that could indicate a potentially malicious file. However, newer methods of heuristic scanning such as generic detection produce false positives less frequently. Generic detection works by looking for features or behaviors that are commonly seen for known threats.
Introduction to Sandboxing
Sandboxes consist of some sort of purpose-built environment, usually virtualized (in some cases physical), where the potentially malicious files are executed and their behavior is recorded. The recorded behavior is then analyzed automatically through a weights system in the sandbox and/or manually by a malware analyst. The goal of this analysis is to determine whether the file is malicious and if it is, what exactly the file does.
Benefits of Sandboxing
- Because sandboxing actually opens the file being analyzed, it is able to see in detail exactly what that file will do in that particular environment.
- Instead of a binary yes/no and threat name, most sandboxes offer reporting with details on the behavior recorded. In addition to providing more information on how to classify the file, this method can be particularly useful in an incident response environment in order to identify exactly what the intention of the file was, in order to understand what the effects are.
- Though it varies by product, many offer the ability to create a highly customized environment. For example, a piece of malware that is designed to only fully execute on a particular user’s machine can be replicated.
Limitations of Sandboxing
- Because of the visibility to their methodology and customization that is available in commercial sandboxes, malware creators can build specific behaviors to get around detection. This includes two key categories:
- “Sandbox aware” malware which is able to tell it is being executed in a sandbox and will act differently in order to not be flagged as malicious. This may be as simple as not running on any virtual machine, or something more advanced looking for signs specific to a sandbox.
- Blind spots will vary based on the product, but in some cases malware creators have created pathways to act maliciously in ways which cannot be detected by the sensors of a particular sandbox.
- There needs to be an environment to execute the sample and the time necessary to collect full reports, particularly if trying to accommodate stalled code execution, it takes both a large amount of time and hardware resources to process a given sample, causing relatively low throughput.
- While the industry trend is towards automated sandboxes, many still only provide the raw data on behavior of the malware and it is necessary to either build a custom application to interpret the information, or have a malware analyst manually review the information.
- Due to the overhead time in running them, many sandboxes are optionally or completely cloud-based, which renders sensitive files as unusable.
As detailed above, sandboxing does have its limitations. We recommend using sandboxing in combination with other methods, like multi-scanning, to increase malware detection rates.
Both heuristic-based scanning and sandboxing present unique strengths and weaknesses, and for different situations one scanning method may be more appropriate than the other. The best security comes from utilizing both methods simultaneously in order to minimize the number of samples which may be able to evade detection. Multi-scanning (scanning with multiple anti-malware engines) to take advantage of the differing heuristic algorithms of many scan engines.
As proof of the benefits of using many scan engines for a layered approach, we took at look at our cloud-based multi-scanning solution, Metadefender Cloud, which is powered by anti-malware engines that use both heuristic and signature-based methods to detect threats.
Top searched for threats on Metadefender Cloud's statistics page
By looking at the statistics page results above, you can see that the percentage of threats detected increases as more anti-malware engines are added. Metadefender Core 4, which includes 4 anti-malware engines, detected 89.41% of the top 10,000 threats compared to Metadefender Core 20 plus 13 custom engines, which detected 99.88% of the same threats. The statistics page highlights the value of multi-scanning; every engine has different strengths and weaknesses, so the more engines you have, the greater the chance of detecting threats.
If you are interested in learning more different scanning methods, you can check out our blog post on how multi-scanning compares to online sandboxes.