Posted by Tony Berning / March 13, 2014
We have talked many times on this blog about the benefits of multi-scanning and how using multiple anti-malware engines can increase detection rates of viruses, especially new virus outbreaks that are just starting to spread around the world. For an overview of the general benefits of multi-scanning, please review the Metadefender Core product page.
The greatest benefits to using multiple antivirus engines for malware detection come when there is a low correlation coefficient between the engines for different detection rates. When engine detections are highly correlated they will detect a very similar set of malware, with the most extreme case being two engines that have a correlation coefficient of 1, where both engines detect exactly the same set of malware, and do not detect any malware that the other engine does not detect. The opposite case would be engines with a correlation coefficient of 0, in which case the detection of one engine would have zero predictive power in whether the other engine does or does not detect a piece of malware.
An example illustration of how engines can be strongly and weakly correlated is below.
In this illustration, you can see that engines A and B are weakly correlated while engines B and C are strongly correlated. This means that the detection rate of the combination of engines B and C will have a higher detection rate for malware than the combination of engines A and B.
Each engine vendor has its own methods of determining whether a sample file is a potential threat. Vendors make this determination by gathering samples and examining existing threats so that they can better predict whether a new sample is a threat or not. Most vendors have both automated ways of analyzing files as well as teams of analysts who perform a more manual analysis.
A vendor’s geographic location has an impact on the samples that they are able to collect and analyze in their research. Analysts working in labs may have more access to samples from their immediate geographic area and companies may have stronger relationships with sample providers who are physically closer to them. Analysts may also have personal relationships with individuals in the physical world that help them to better identify potential threats or who may alert them to new malware. Finally, malware that is targeted towards companies or systems in a certain geographic area or who operate in a different language may be first identified by labs in that same geographic area where the analysts speak the same language.
New outbreaks of viruses, especially new types of viruses, are often first identified by individual anti-malware engines before the wider community is able to detect it. The virus Stuxnet, for example, was first detected by a small Belorussian antivirus provider, Virusblokada, before it was detected by any other major antivirus engine. It is with new outbreaks where the correlation coefficient of detection rates between engines is the lowest and where you get the most benefit from having engines with weakly correlated detection rates.
In a multi-scanning solution, having engines from a diverse set of geographic areas helps to make the detection rate correlation coefficients between the engines as low as possible, which improves overall malware detection rates. The more diverse the selection of engines can be, the higher the probability that threats, especially new outbreaks, will be detected by the combination of the engines.
Below is a map of the headquarters for all of the engines in Metadefender Cloud as of March 2014. Many vendors also have additional research labs in multiple locations, which are not indicated on this map.