Automatic trace generation

Introduction

The Mutrics project aims at developing methods for automatic extraction of IP traffic traces directly from applications. A successful implementation would partially resolve the problem of sharing valuable IP traffic traces annotated with ground truth information.

The basic idea is to replace the application user with a robot, ie. a computer program which mimics user behaviour. Of course, this is tricky as it poses several limitations on the obtained traffic files. For example, the statistical features of the traffic flows will be affected and should not be used as a feature for traffic classification.

However, several other kind of features may be usable for classification, probably. For example, protocol format and side-channel communications, like DNS.

General architecture

A generic system for automatic trace generation requires two key elements:
  1. a sniffer,
  2. a GUI macro system.
For (1), the Mutrics project developed a specialized single-application sniffer: tracedump. Basically, the program attaches to given Linux process and writes all of its IP packets on disk.

For (2), several possible solutions exist, see Wikipedia List of GUI testing tools for a quick reference.

Once the target application is chosen, a special robot (implementing a user model) is programmed using the GUI automation system and run repeatedly, so the application generates IP traffic. At the same time the sniffer captures all of the traffic data.

In case the resulting PCAP file can not be published, the script written in GUI automation system can be published instead, so a similar IP traffic trace file can be generated once again.

Preliminary implementation

A preliminary, straightforward implementation has been made using tracedump and Sikuli script.

Below you can find examples of applications to Google Maps, Google Docs and Google Spreadsheets. The Google Docs examples are particularly hard ones, because there is little side-channel and protocol format information available.

Google Maps
  • Sikuli source code (ZIP)
  • resultant traffic files: Firefox, Opera, Chrome
  • comment: interesting thing is that Opera opens much more connections and sends much shorter packets, which is an interesting classification feature
Google Docs