Using naming conventions to track the detection of viruses can be difficult because vendors often report the same virus with completely different names, even if they otherwise agree on the format. Although this is not always true, it applies to most of the cases we have experienced at OPSWAT. The question is, how similar are the names of viruses used by different vendors and are there any trends in naming patterns used?
As a first step in answering this question, we did a search online to try to find naming convention rules for anti-malware vendors with the largest market share. Surprisingly, we were only able to locate naming convention information from four vendor websites:
For research purposes, we decided to conduct an experiment where we scanned 30 well-known malware samples with our multiple anti-malware scanner, Metascan Online, and analyzed the scan results. The table below shows a summary of the detection details from our research:
Virus Name | # of Engines that Detected the Threat | Naming pattern across different engines (shared by # vendors) |
Win32.Madang.C | 12 | Win32*Madang* (10) |
JS/Exploit-Blacole.jf | 22 | Trojan*Script* (4), JS*Blacole (5) |
Trojan/Win32.SGeneric | 27 | *Domal* (18) |
Trojan.ADRD | 24 | *Andr*Geinimi* (3), *Android*Adrd* (13) |
Trojan.DR.Diple.Gen.4 | 34 | *Gen*Variant*Sireferf* (4), *Win32*Vobfus* (10), *W32*VBinject*Gen* (3) |
Trojan-PWS.Win32.Kykymber | 33 | *PSW*Kykymber* (11), *PWS*Onlinegames*(11) |
Win32/Expiro4.Gen | 30 | *Win32*Expiro* (26) |
Script/Exploit.Kit | 20 | *JS*Blacole* (7), *Troj*Iframe* (5) |
Skodna.Bundle.BD | 15 | *Inst*Core* (11) |
ADWARE/InstallRex.Q | 25 | *Gen*Variant*Inst* (4), *Adware*Inst* (4) |
Trojan/Win32.Bladabindi | 10 | *Gen*Variant*Barys* (4) |
Trojan/Win32.Agent | 34 | *Gen*Variant*Symmi* (4), *Worm*Gamarue* (9) |
Virus/Win32.Nimnul.a | 36 | *Ramnit* (23), *Nimnul* (6) |
Riskware/MyWebSearch | 10 | *MyWebSearch* (8) |
Riskware.Win32.FunWeb.dbxkle | 34 | *FunWeb* (3), *MyWebSearch* (3) |
PUP.Win32.MindSpark.F | 36 | *MyWebSearch* (7) |
Application.Agent.HN | 10 | *Application*Agent*HN* (3), *ClientConnect* (3) |
Troj/Keygen-DX | 7 | *HackTool* (3), *Keygen* (7) |
Adware.SearchProtect.2 | 9 | *SearchProtect* (3) |
Adware.SearchProtect.ky.tmre | 14 | *SearchProtect* (8) |
Adware.WProtManager.Win32.21 | 18 | *WProtManager* (5), *Gen*Variant*Graftor* (4), *Elex* (6) |
Adware.Win32.Agent | 5 | *Adware*Generic* (4), *Trojan*Click* (3), *Adware*Agent* (7) |
ADWARE/Adware.Gen | 12 | *Adware*Generic* (5), *Elex* (4) |
MyPCBackup.E.foha | 22 | *MyPCBackup* (7) |
virus_nameAdWare.Agent | 21 | *Elex* (4), *Trojan*Click* (3), *Adware*Agent* (8) |
Win.Adware.SupTab | 16 | *Adware*SupTab* (4), *Mutabaha* (3) |
Adware.MAC.OSX.Genieo.BU | 10 | *Adware*OSX* (6) |
Hacktool.IdleKMS.C.gfky | 11 | *Hacktool* (3), *KMS*(5) |
Adware/Agent.lmx | 16 | *Adware*Agent* (6), *Elex* (5) |
Adware.Suptab.A | 22 | *Adware*Agent* (8), *Adware*Suptab* (5) |
Conclusions
From data above, we can infer that the naming conventions seem to lack consistency across different anti-malware vendors - there isn't even consistency in the inconsistency! However, if we build some regular expression filters and try to group the results into a certain pattern, we should still be able to find a few nuggets of useful information.
- BitDefender, Emsisoft, F-secure and Lavasoft are all comparable for detection rates and virus naming conventions.
- Some malware files were detected by Sophos but not by Preventon. However, if both of these vendors report a threat, the naming convention reported for the threat is usually the same.
- Similarly to Sophos and Prevention, some malware files are detected by CYREN but not F-Prot. However, if both of these two vendors report a malware type, their reported virus naming convention is usually the same.
- Microsoft always follows the naming convention as "Type:Platform/MalwareFamily.Variant". For example, "Virus:Win32/Madang.A!dam", "TrojanSpy:AndroidOS/Adrd.A", "Worm:Win32/Vobfus.gen!O" and so on.
- Most anti-malware vendors report the virus' behavior and OS consistently. That being said, nearly every virus would have 3 to 5 different values of their family, name and variant from different anti-malware vendors.
- Trojans and worms are two of the most confusing categories across vendors. Some vendors will classify a Trojan as worm while others may do the complete opposite and name a worm as a Trojan. This makes it particularly difficult to find vendor detection for these types of malware. CARO officially states that 'worm' is not a malware type, but many vendors still use it.
- The virus naming convention used by K7 Computing is different from any other vendor. K7 Computing uses the following format: "Behavior (9 digit unique id)".
There are definitely more clues that we could have pulled from the raw data above, but they don't really offer a reliable way for users or other post-detection programs to parse and execute their next action. Sadly, there isn't an industry-wide malware naming convention system that has gained widespread adoption, though several have tried. CARO (Computer Antivirus Research Organization), perhaps one of the best-known organizations on virus naming conventions, has been pushing for a naming standard since the 1990's. Unfortunately, they did not get very far in convincing anti-malware vendors - there are simply too many practical limitations to maintain consistency after the conventions are adopted. A few other vendors have tried and failed to succeed after CARO. One such example is the CME (Common Malware Enumeration). Hoping to capitalize on the success of the CVE (Common Vulnerability Enumeration), they petitioned for a common naming standard but failed due to the changing nature of malware. Their website still provides details on the venture, nearly 10 years after ceasing the project:
In late 2006 the malware threat changed away from the pandemic, widespread threats CME was developed to address to more localized, targeted threats, which significantly reduced the need for common malware identifiers to mitigate user confusion in the general public.
Therefore, all CME-related efforts transitioned into support to MITRE's Malware Attribute Enumeration and Characterization (MAEC™) effort.
Interestingly enough, the new effort, MAEC (Malware Attribute Enumeration and Characterization), focuses more on attributes and less on actual malware specimens. They describe the new project in the following way:
International in scope and free for public use, MAEC is a standardized language for encoding and communicating high-fidelity information about malware based upon attributes such as behaviors, artifacts, and attack patterns.
By eliminating the ambiguity and inaccuracy that currently exists in malware descriptions and by reducing reliance on signatures, MAEC aims to improve human-to-human, human-to-tool, tool-to-tool, and tool-to-human communication about malware; reduce potential duplication of malware analysis efforts by researchers; and allow for the faster development of countermeasures by enabling the ability to leverage responses to previously observed malware instances.
This new effort is commendable and gets to the heart of the real issue — how can organizations share information about threats? It's a hot topic in the news that no one has an easy answer to. The US Federal government has created a new entity specifically for this purpose and the CIA is reorganizing to make sharing info on cyber threats easier. Additionally, many vendors in the security industry are creating platforms for sharing collective intelligence on threats. Of course, there is also a language barrier to consider when naming threats; it would be much easier if vendors could agree on a common language to use.
The increasing popularity of threat exchanges may hasten the widespread adoption of a single threat description language. As of now, there are several competing standards in use (thankfully the formats are often interchangeable) though issues such as copyright of the language and patentability of inventions have given some in the industry reason for concern.
Different geography, languages, and focus amongst anti-malware vendors are all challenges that have hampered efforts for a unified malware naming convention standard. That being said, one could argue that they are not blockers. Take another industry, encryption, for example. How was it that RSA encryption became such a successful standard? How did RSA become the standard over others? Are there lessons to be learned from the RSA example and can they be used to create a general malware database naming convention across most of the anti-malware vendors and research organizations? We don't have the answer to this question yet, but this is definitely something that security industry professionals should think about and move towards.
Or maybe a philosophical change is in order. Take the quote from Mitre about MAEC "MAEC is a standardized language for … malware based upon attributes such as behaviors, artifacts, and attack patterns. By eliminating the ambiguity and inaccuracy that currently exists in malware descriptions and by reducing reliance on signatures…" Mitre isn't trying to correct the problems CARO experienced in implementation, rather they are proposing an entirely different way of communicating information about malware, instead of trying to name the malware.
English regionalisms are a nice analog for this challenge and solution. For example, on the West Coast we have sprinkles, however on the East Coast there are jimmies. These are identical products, but to someone without this knowledge they would have no reasonable chance of associating the two unless they saw them side-by-side and found no differences. Even if this observer were to conclude that sprinkles are jimmies, what would they now call them? Ostensibly they would need to remember both names and use the appropriate name depending on their audience. Alternatively, they could declare a unique number or new name and get everyone to agree on its use, or taking MAEC as an example the solution would be to replace the name with a description. So instead of a sprinkle or a jimmie, they would be "small particles of chocolate, candy, sugar, etc., used as a decorative topping for cookies, cakes, ice cream cones, and the like." [1] The drawback to this is obvious and, therefore, impractical for human communication, but not necessarily for computers. The 'cost' of being more verbose is more easily overcome by a computer than a human, but eventually a human is still involved and may just want to call it a sprinkle.
The technology to handle this problem surely exists. Search engine algorithms are more than capable of creating the associations needed to standardize malware references, but without a monetary incentive, it is unlikely that someone will ever invest the time and money into creating a solution in this way. In addition, because of the different technologies used by scanning engines, the differences in language and culture behind the naming definitions, and the various marketing strategies executed by anti-malware vendors, it brings the challenge of unifying malware names to a nearly impossible level. VGrep is probably the closest attempt, and while not perfect, it is extremely useful and is an example of what is possible. However, at the end of the day we may need to accept the fact that, similar to coke, pop and soda, people would just call the same thing with different names.
References
- Naming convention resource from BitDefender
- Naming convention resource from ESET
- Naming convention resource from Lenny Zeltser
Credit for the content in this post is also attributed to Adam Winn, Senior Product Manager at OPSWAT.