Standard Deviation Detection

To perform standard deviation detection on a given traffic feature, the an arithmetic mean and standard deviation must be computed for the traffic. These are then used to generate standard deviation scores, which is the number of standard deviations away an observed metric is away from the mean. The scores can then be used to generate alarms based on a threshold.

When computing the arithmetic mean and standard deviation, it is desired to compute them on traffic which contains no anomalies. Computing them on traffic with anomalies can inflate the arithmetic mean and standard deviation, possibly increasing the number of false negatives.

To support traffic sets which ground truth is unknown, we provide an iterative cleaning method for computing the average and standard deviation. The general idea is to keep removing intervals that would be considered anomalous past some threshold from the series of values, recomputing the average and standard deviation, and continuing this in an iterative process until the average and standard deviation no longer change. This effectively removes the large anomalies which would greatly skew the results of the detection method.

Generating Standard Deviation Scores

We provide MATLAB scripts for computing the standard deviation scores through the iterative process and without the iterative process on traffic which ground truth is known and there are no anomalies. These scripts need not be run on entropy data files, they can be used on any sort of aggregate metric to represent the interval. We will however provide examples and our guide using entropy data files.

Although we present the individual usage of the MATLAB scripts, a ruby wrapper (gen_deviation_scores.rb) is available which runs on Datapository entropy dumps to automatically generate all of the standard deviation scores for all metrics.

gen_deviation_scores.rb

  • Usage: ./gen_deviation_scores.rb <location_of_dp_entropy_dump> <sdev/wavelet/all>
  • Description: a wrapper for generating deviation scores for the anomaly detection methods on a Datapository entropy dump.
  • Output Files: sdev-metric
  • Output Format: <epoch> <dev_score>
  • Example usage:
    $ ./dp_dump_entropy.rb &> /dev/null
    $ ./gen_deviation_scores.rb . sdev
    addr_dst...
    addr_src...
    degree_in...
    degree_out...
    fsd...
    ports_dst...
    ports_src...
    $ ls -l sdev-*
    -rw-r--r--    1 gnychis  users      144161 Mar 18 19:39 sdev-addr_dst
    -rw-r--r--    1 gnychis  users      144125 Mar 18 19:39 sdev-addr_src
    -rw-r--r--    1 gnychis  users      144261 Mar 18 19:39 sdev-degree_in
    -rw-r--r--    1 gnychis  users      144023 Mar 18 19:39 sdev-degree_out
    -rw-r--r--    1 gnychis  users      144183 Mar 18 19:40 sdev-fsd
    -rw-r--r--    1 gnychis  users      144130 Mar 18 19:40 sdev-ports_dst
    -rw-r--r--    1 gnychis  users      144155 Mar 18 19:40 sdev-ports_src
    $ head -n 3 sdev-addr_dst 
    1107234000 -0.563939
    1107234300 0.111286
    1107234600 -0.360646
    

dev_scores_cleaning.m

  • Usage: dev_scores_cleaning in_file out_file
  • Description: creates an output as a timeseries with standard deviation scores with iterative cleaning to be used for detection
  • Output Format: <epoch> <dev_score>
  • Example usage:
    $ matlab
    >> dev_scores_cleaning entropy-addr_dst sdev-addr_dst
    >> exit
    $ head -n 3 sdev-addr_dst 
    1107234000 -0.563939 
    1107234300 0.111286 
    1107234600 -0.360646 
    

dev_scores.m

  • Usage: dev_scores in_file out_file
  • Description: creates an output as a timeseries with standard deviation scores to be used for detection
  • Output Format: <epoch> <dev_score>
  • Example usage: see dev_scores_cleaning example

Caching Deviation Scores

Although computing deviation scores is not as computationally intensive as computing entropy, we still provide caching of deviation scores in the INTERVAL_STATS table for usage during the alarm generation phase.

Updating the deviation scores is as easy as generating them with the ruby script. After gen_deviation_scores.rb is run, dp_update_deviation_scores.rb can read its output and cache the newly computed values in to the database.

dp_update_deviation_scores.rb

  • Usage: ./dp_update_deviation_scores.rb <location_of_dp_entropy_dump> <sdev/wavelet/all>
  • Description: a wrapper for reading deviation scores from a gen_deviation_scores.rb dump and caching them in the database.
  • Example usage:
    $ ./dp_dump_entropy.rb &> /dev/null
    $ ./gen_deviation_scores.rb . sdev &> /dev/null
    $ ./dp_update_deviation_scores.rb . sdev
    addr_src...
    degree_in...
    ports_dst...
    degree_out...
    addr_dst...
    volume...
    fsd...
    ports_src...
    

Generating Alarms

Generating alarms is most easily done after inserting the deviation scores in to the database. The deviation scores in the INTERVAL_STATS table can then be used to generate the alarms based on a threshold. The first thing to do is to create an alarm type if one does not exist in the ALARMS table.

Check for the alarm type:

dp=> select name from alarms;
    name     
-------------
 sdev3
 sdev4
 sdev5
 wavelet3
 wavelet4
 wavelet5
 causality
 trw
 multires20
 multires60
 multires100
 multires300
 multires600

The sdev# alarm types represent alarms generated when using # as the threshold such that any intervals flagged with the sdev3 alarm type had a deviation score greater than 3 for a given traffic feature.

To add an alarm type for flagging intervals as anomalous with deviation scores greater than 6:

dp=> INSERT INTO alarms VALUES(nextval('alarms_seq'),'sdev6');
INSERT 0 1

Now lets actually generate the alarms and insert them in to the INTERVAL_STATS table.

dp=> INSERT INTO interval_alarms
dp-> SELECT interval_stats.interval, interval_stats.metric, alarms.alarm_type
dp-> FROM interval_stats, alarms
dp-> WHERE alarms.name='sdev6' and
dp->       (interval_stats.sdev_score>6 or
dp->        interval_stats.sdev_score<-6);
INSERT 0 43

A way to view the alarms:

dp=> SELECT interval, metrics.name AS metric, alarms.name AS alarm_type
dp-> FROM interval_alarms, alarms, metrics
dp-> WHERE alarms.name='sdev6' and 
dp->       interval_alarms.alarm_type=alarms.alarm_type and
dp->       interval_alarms.metric=metrics.metric;

      interval       |  metric   | alarm_type 
---------------------+-----------+------------
 2005-02-23 16:25:00 | addr_src  | sdev6
 2005-02-22 11:55:00 | addr_src  | sdev6
 2005-02-22 10:55:00 | addr_dst  | sdev6
 2005-02-22 11:55:00 | ports_src | sdev6
 2005-02-22 10:55:00 | ports_src | sdev6
 2005-02-22 11:55:00 | ports_dst | sdev6
 2005-02-11 11:10:00 | degree_in | sdev6
 2005-02-10 15:25:00 | degree_in | sdev6
 2005-02-22 10:55:00 | fsd       | sdev6
 2005-02-20 08:00:00 | volume    | sdev6
 2005-02-16 16:55:00 | volume    | sdev6
 2005-02-16 16:45:00 | volume    | sdev6
 .....