home who we are what we do publications
 

VILO: Malware Search and Analysis Capabilities

Introduction

Related publications

  1. Evaluation of Malware Phylogeny Modelling Systems Using Automated Variant Generation,
    Journal in Computer Virology, (to appear), 2008.
  2. Simulating Malware Evolution for Evaluating Program Phylogenies,
    2008.
  3. VILO: A Shield in the Malware Variation Battle,
    Virus Bulletin, pp.5-10, 2007.
  4. Phylogenetic Comparisons of Malware,
    Virus Bulletin 2007, 2007.
  5. Exploiting Similarity Between Variants to Defeat Malware,
    Proceedings of BlackHat Briefings DC 2007, 2007.
  6. Malware Phylogeny Generation Using Permutations of Code,
    Journal in Computer Virology, 1 (1) , pp.13-23, 2005.
  7. Malware Phylogeny Using Maximal Pi-Patterns,
    EICAR 2005 Conference: Best Papers Proceedings, pp.167-174, 2005.

The Vilo project seeks to develop new theories and technologies for countering threats relating to malware: worms, trojans, viruses, rootkits, etc. The group is exploring new ways of integrating multiple research streams to solve problems in this area, including:

  • static program analysis
  • search and information retrieval / pattern matching
  • visualization
  • collaboration interfaces

We are seeking to integrate some techniques from other concurrent projects, including the Metamorphic Undo project.

The main underlying theme of research within Vilo is the discovery and leveraging of reuse within malware. Most malware programs one may find are derived from existing code: from previous versions or published exploits. While code reuse is an advantage to black hats, the Vilo project seeks to turn their advantage into a weakness that can be exploited in defense. It seeks to make fundamental advances in matching codes in executables, and in develop new distributed collaboration techniques so that the knowledge of past malware analyses can be leveraged as new variations are found.

This page provides brief overviews of the work in this project. More details can be found in current and upcoming publications.

Search Portal

A key research focus in Vilo is building a useful collection of similarity scoring functions for malicious executables. With such a similarity function one can do various things, including:

  • Recognizing new variants of existing malware. Applications include: malware filters and scanners, and search engines for executables.
  • Malware phylogenies: graphs of relationships analogous to the ``tree of life'' in biology.


In order to properly evaluate the research we are putting a prototype tool into the hands of anti-malware researchers. We are trying to do this via an online portal that provides malware search capabilities. Currently this is being made available to select anti-malware analysts. However we will be ramping up the portal and soliciting feedback from a wider audience. If you are interested in participating, please get in touch with us.

Malware Phylogenies

Malware evoles much as any other type of software evoles. New "features" are added, fixes are made, etc. Perhaps a distinguishing quality of malware evolution is that many of the "release" versions are only different by a small amount, typically to throw off the scanners and filters that look for characteristics of the previous versions. A phylogenic graph of the possible derivations and relationships between samples can be helpful in understanding the evolution. The figure below illustrates part of a phylogeny (from this paper).

Part of a malware phylogeny ("family tree")
Part of a malware phylogeny ("family tree")

In the malware phylogeny project we are investigating methods for creating useful phylogenic graphs. A key problem is accounting for the various transformations made that obfuscate the provenance of the code. These obfuscations need to be accounted for in the comparison process. For example, new samples may be released in which the order of the code is permuted --- through function motion, code block reordering, statement reodering, and so on. One approach we have explored (used to generate the figure above) is to use a feature-matching approach that allows for such permutations of ordering.

We are also investigating the problem of evaluating malware phylogeny generation systems. Without solid evaluation method it is impossible to know how to improve the state of the art. Matt Hayes's research project involves constructing two different models for generating artificial evolution histories, which can be then used for systematic testing and comparison of phylogeny models via objective tree distance metrics. We are also trying out phylogeny model generators on tough test cases (see our VB presentation).