Introduction
Related publications
-
Evaluation of Malware Phylogeny Modelling Systems Using Automated Variant Generation,
Journal in Computer Virology,
(to appear),
2008.
-
Simulating Malware Evolution for Evaluating Program Phylogenies,
2008.
-
VILO: A Shield in the Malware Variation Battle,
Virus Bulletin,
pp.5-10,
2007.
-
Phylogenetic Comparisons of Malware,
Virus Bulletin 2007,
2007.
-
Exploiting Similarity Between Variants to Defeat Malware,
Proceedings of BlackHat Briefings DC 2007,
2007.
-
Malware Phylogeny Generation Using Permutations of Code,
Journal in Computer Virology,
1
(1)
,
pp.13-23,
2005.
-
Malware Phylogeny Using Maximal Pi-Patterns,
EICAR 2005 Conference: Best Papers Proceedings,
pp.167-174,
2005.
The Vilo project seeks to develop new theories and technologies for
countering threats relating to malware: worms, trojans,
viruses, rootkits, etc. The group is exploring
new ways of integrating multiple research streams to solve problems
in this area, including:
- static program analysis
- search and information retrieval / pattern matching
- visualization
- collaboration interfaces
We are seeking to integrate some techniques from other concurrent
projects, including the
Metamorphic Undo
project.
The main underlying theme of research within Vilo is the discovery
and leveraging of reuse within malware. Most malware programs one
may find are derived from existing code: from previous versions or
published exploits. While code reuse is an advantage to black hats,
the Vilo project seeks to turn their advantage into a weakness that
can be exploited in defense. It seeks to make fundamental advances
in matching codes in executables, and in develop new distributed
collaboration techniques so that the knowledge of past malware analyses
can be leveraged as new variations are found.
This page provides brief overviews of the work in this project.
More details can be found in current and upcoming publications.
Search Portal
A key research focus in Vilo is building a useful collection of
similarity scoring functions for malicious executables. With such
a similarity function one can do various things, including:
- Recognizing new variants of existing malware. Applications
include: malware filters and scanners, and search engines
for executables.
- Malware phylogenies: graphs of relationships
analogous to the ``tree of life'' in biology.
In order to properly evaluate the research we are putting a prototype
tool into the hands of anti-malware researchers. We are trying to do
this via an online portal that provides malware search capabilities.
Currently this is being made available to select anti-malware analysts.
However we will be ramping up the portal and soliciting feedback from
a wider audience. If you are interested in participating, please
get in touch with us.
Malware Phylogenies
Malware evoles much as any other type of software evoles. New
"features" are added, fixes are made, etc. Perhaps a distinguishing
quality of malware evolution is that many of the "release" versions
are only different by a small amount, typically to throw off the
scanners and filters that look for characteristics of the previous
versions. A phylogenic graph of the possible derivations and
relationships between samples can be helpful in understanding the
evolution. The figure below illustrates part of a phylogeny
(from
this paper).
In the malware phylogeny project we are investigating methods for
creating useful phylogenic graphs. A key problem is accounting for
the various transformations made that obfuscate the provenance of the
code. These obfuscations need to be accounted for in the comparison
process. For example, new samples may be released in which the
order of the code is permuted --- through function motion, code
block reordering, statement reodering, and so on. One approach
we have explored (used to generate the figure above) is to use
a feature-matching approach that allows for such permutations of
ordering.
We are also investigating the problem of evaluating malware phylogeny
generation systems. Without solid evaluation method it is impossible
to know how to improve the state of the art. Matt Hayes's research
project involves constructing two different models for generating
artificial evolution histories, which can be then used for systematic
testing and comparison of phylogeny models via objective tree distance
metrics. We are also trying out phylogeny model generators on tough
test cases (see our VB presentation).
|