Labeling Anomalies

Labeling anomalies in a traffic set is extremely useful for multiple levels of analysis. A traffic set for use with the testbed may already be labeled, which we supporting inserting the labels in to testbed, or it may be an unlabeled traffic set which we support creating labels and associating flows with these labels as you perform anomaly detection and investigate the causes of alarms. Labeling originally completely unlabeled traffic sets is not only useful for your own analysis, but for the analysis of others using public traffic sets in the future who are now able to use your labels to better understand their analysis.

All labels have an assigned type, which allows users to easily search for specific attack types in the traffic set, such as finding where inbound bandwidth floods occurred in the traffic. Each label also has flows associated with it which are the flows that are believed to have belonged to the attack. This is useful for understanding false positive and negative rates of specific attacks and not just false positive and negative rates across all attacks. This can strengthen an argument for a specific method and metric against a specific attack, or show weaknesses in anomaly detection methods to detecting specific attacks.

Along with labels being assigned types, labels also have a set of associated flows. These flows are flows which are believed to have been generated by the labeled attack. By labeling the flows we allow users the functionality to:

  • remove the attack flows before computing some sort of traffic analysis by retrieving the flows that are not in the labeled set
  • easily perform analysis on only the attack flows by isolating them
  • use the labeled attack flows for insertion in to synthetic attacks

Using labeled flows for synthetic attacks is a very useful functionality. By allowing this, we all answers to questions such as: although my method detected the attack during light usage times of day, would it also be detectable during a heavy usage time? Can I still detect this port scan if I move the attack flows to an interval which included a bandwidth flood? If I remove half of the attack flows, reducing the magnitude of the attack, is it still detectable? All of these questions can be answered by using labeled attack flows and/or synthetic attack analysis.

To support labeling anomalies we provide a framework for inserting your own labels, creating new labels, associating flows with the labels, and retrieving flows associated with the labels. This allows you to do all of what we described as the benefit to labeling anomalies. We describe each in separate sections: