Standard Deviation Detection
To perform standard deviation detection on a given traffic feature, the an arithmetic mean and standard deviation must be computed for the traffic. These are then used to generate standard deviation scores, which is the number of standard deviations away an observed metric is away from the mean. The scores can then be used to generate alarms based on a threshold.
When computing the arithmetic mean and standard deviation, it is desired to compute them on traffic which contains no anomalies. Computing them on traffic with anomalies can inflate the arithmetic mean and standard deviation, possibly increasing the number of false negatives.
To support traffic sets which ground truth is unknown, we provide an iterative cleaning method for computing the average and standard deviation. The general idea is to keep removing intervals that would be considered anomalous past some threshold from the series of values, recomputing the average and standard deviation, and continuing this in an iterative process until the average and standard deviation no longer change. This effectively removes the large anomalies which would greatly skew the results of the detection method.
Generating Standard Deviation Scores
We provide MATLAB scripts for computing the standard deviation scores through the iterative process and without the iterative process on traffic which ground truth is known and there are no anomalies. These scripts need not be run on entropy data files, they can be used on any sort of aggregate metric to represent the interval. We will however provide examples and our guide using entropy data files.
Although we present the individual usage of the MATLAB scripts, a ruby wrapper (gen_deviation_scores.rb) is available which runs on Datapository entropy dumps to automatically generate all of the standard deviation scores for all metrics.
- Usage: ./gen_deviation_scores.rb <location_of_dp_entropy_dump> <sdev/wavelet/all>
- Description: a wrapper for generating deviation scores for the anomaly detection methods on a Datapository entropy dump.
- Output Files: sdev-metric
- Output Format: <epoch> <dev_score>
- Example usage:
$ ./dp_dump_entropy.rb &> /dev/null $ ./gen_deviation_scores.rb . sdev addr_dst... addr_src... degree_in... degree_out... fsd... ports_dst... ports_src... $ ls -l sdev-* -rw-r--r-- 1 gnychis users 144161 Mar 18 19:39 sdev-addr_dst -rw-r--r-- 1 gnychis users 144125 Mar 18 19:39 sdev-addr_src -rw-r--r-- 1 gnychis users 144261 Mar 18 19:39 sdev-degree_in -rw-r--r-- 1 gnychis users 144023 Mar 18 19:39 sdev-degree_out -rw-r--r-- 1 gnychis users 144183 Mar 18 19:40 sdev-fsd -rw-r--r-- 1 gnychis users 144130 Mar 18 19:40 sdev-ports_dst -rw-r--r-- 1 gnychis users 144155 Mar 18 19:40 sdev-ports_src $ head -n 3 sdev-addr_dst 1107234000 -0.563939 1107234300 0.111286 1107234600 -0.360646
- Usage: dev_scores_cleaning in_file out_file
- Description: creates an output as a timeseries with standard deviation scores with iterative cleaning to be used for detection
- Output Format: <epoch> <dev_score>
- Example usage:
$ matlab >> dev_scores_cleaning entropy-addr_dst sdev-addr_dst >> exit $ head -n 3 sdev-addr_dst 1107234000 -0.563939 1107234300 0.111286 1107234600 -0.360646
- Usage: dev_scores in_file out_file
- Description: creates an output as a timeseries with standard deviation scores to be used for detection
- Output Format: <epoch> <dev_score>
- Example usage: see dev_scores_cleaning example
Caching Deviation Scores
Although computing deviation scores is not as computationally intensive as computing entropy, we still provide caching of deviation scores in the INTERVAL_STATS table for usage during the alarm generation phase.
Updating the deviation scores is as easy as generating them with the ruby script. After gen_deviation_scores.rb is run, dp_update_deviation_scores.rb can read its output and cache the newly computed values in to the database.
- Usage: ./dp_update_deviation_scores.rb <location_of_dp_entropy_dump> <sdev/wavelet/all>
- Description: a wrapper for reading deviation scores from a gen_deviation_scores.rb dump and caching them in the database.
- Example usage:
$ ./dp_dump_entropy.rb &> /dev/null $ ./gen_deviation_scores.rb . sdev &> /dev/null $ ./dp_update_deviation_scores.rb . sdev addr_src... degree_in... ports_dst... degree_out... addr_dst... volume... fsd... ports_src...
Generating Alarms
Generating alarms is most easily done after inserting the deviation scores in to the database. The deviation scores in the INTERVAL_STATS table can then be used to generate the alarms based on a threshold. The first thing to do is to create an alarm type if one does not exist in the ALARMS table.
Check for the alarm type:
dp=> select name from alarms;
name
-------------
sdev3
sdev4
sdev5
wavelet3
wavelet4
wavelet5
causality
trw
multires20
multires60
multires100
multires300
multires600
The sdev# alarm types represent alarms generated when using # as the threshold such that any intervals flagged with the sdev3 alarm type had a deviation score greater than 3 for a given traffic feature.
To add an alarm type for flagging intervals as anomalous with deviation scores greater than 6:
dp=> INSERT INTO alarms VALUES(nextval('alarms_seq'),'sdev6');
INSERT 0 1
Now lets actually generate the alarms and insert them in to the INTERVAL_STATS table.
dp=> INSERT INTO interval_alarms dp-> SELECT interval_stats.interval, interval_stats.metric, alarms.alarm_type dp-> FROM interval_stats, alarms dp-> WHERE alarms.name='sdev6' and dp-> (interval_stats.sdev_score>6 or dp-> interval_stats.sdev_score<-6); INSERT 0 43
A way to view the alarms:
dp=> SELECT interval, metrics.name AS metric, alarms.name AS alarm_type
dp-> FROM interval_alarms, alarms, metrics
dp-> WHERE alarms.name='sdev6' and
dp-> interval_alarms.alarm_type=alarms.alarm_type and
dp-> interval_alarms.metric=metrics.metric;
interval | metric | alarm_type
---------------------+-----------+------------
2005-02-23 16:25:00 | addr_src | sdev6
2005-02-22 11:55:00 | addr_src | sdev6
2005-02-22 10:55:00 | addr_dst | sdev6
2005-02-22 11:55:00 | ports_src | sdev6
2005-02-22 10:55:00 | ports_src | sdev6
2005-02-22 11:55:00 | ports_dst | sdev6
2005-02-11 11:10:00 | degree_in | sdev6
2005-02-10 15:25:00 | degree_in | sdev6
2005-02-22 10:55:00 | fsd | sdev6
2005-02-20 08:00:00 | volume | sdev6
2005-02-16 16:55:00 | volume | sdev6
2005-02-16 16:45:00 | volume | sdev6
.....
