= Standard Deviation Detection = To perform standard deviation detection on a given [wiki:TrafficFeatures traffic feature], the an [http://en.wikipedia.org/wiki/Average arithmetic mean] and [http://en.wikipedia.org/wiki/Standard_deviation standard deviation] must be computed for the traffic. These are then used to generate standard deviation scores, which is the number of standard deviations away an observed metric is away from the mean. The scores can then be used to generate alarms based on a threshold. When computing the arithmetic mean and standard deviation, it is desired to compute them on traffic which contains no anomalies. Computing them on traffic with anomalies can inflate the arithmetic mean and standard deviation, possibly increasing the number of false negatives. To support traffic sets which ground truth is unknown, we provide an ''iterative cleaning'' method for computing the average and standard deviation. The general idea is to keep removing intervals that would be considered anomalous past some threshold from the series of values, recomputing the average and standard deviation, and continuing this in an iterative process until the average and standard deviation no longer change. This effectively removes the large anomalies which would greatly skew the results of the detection method. == Generating Standard Deviation Scores == We provide [http://www.mathworks.com/ MATLAB] [source:scripts/matlab scripts] for computing the standard deviation scores through the iterative process and without the iterative process on traffic which ground truth is known and there are no anomalies. These scripts need not be run on [wiki:GeneratingEntropy#ComputingEntropy entropy data files], they can be used on any sort of aggregate metric to represent the interval. We will however provide examples and our guide using [wiki:GeneratingEntropy#ComputingEntropy entropy data files]. Although we present the individual usage of the [http://www.mathworks.com/ MATLAB] scripts, a ruby wrapper ([source:scripts/ruby/gen_deviation_scores.rb gen_deviation_scores.rb]) is available which runs on [wiki:GeneratingEntropy#EntropyScripts Datapository entropy dumps] to automatically generate all of the standard deviation scores for all metrics. '''[source:scripts/ruby/gen_deviation_scores.rb gen_deviation_scores.rb]''' * ''Usage'': ./gen_deviation_scores.rb * ''Description'': a wrapper for generating deviation scores for the anomaly detection methods on a [wiki:GeneratingEntropy#EntropyScripts Datapository entropy dump]. * ''Output Files'': sdev-''metric'' * ''Output Format'': * ''Example usage'': {{{ $ ./dp_dump_entropy.rb &> /dev/null $ ./gen_deviation_scores.rb . sdev addr_dst... addr_src... degree_in... degree_out... fsd... ports_dst... ports_src... $ ls -l sdev-* -rw-r--r-- 1 gnychis users 144161 Mar 18 19:39 sdev-addr_dst -rw-r--r-- 1 gnychis users 144125 Mar 18 19:39 sdev-addr_src -rw-r--r-- 1 gnychis users 144261 Mar 18 19:39 sdev-degree_in -rw-r--r-- 1 gnychis users 144023 Mar 18 19:39 sdev-degree_out -rw-r--r-- 1 gnychis users 144183 Mar 18 19:40 sdev-fsd -rw-r--r-- 1 gnychis users 144130 Mar 18 19:40 sdev-ports_dst -rw-r--r-- 1 gnychis users 144155 Mar 18 19:40 sdev-ports_src $ head -n 3 sdev-addr_dst 1107234000 -0.563939 1107234300 0.111286 1107234600 -0.360646 }}} '''[source:scripts/matlab/dev_scores_cleaning.m dev_scores_cleaning.m]''' * ''Usage'': dev_scores_cleaning in_file out_file * ''Description'': creates an output as a timeseries with standard deviation scores with iterative cleaning to be used for detection * ''Output Format'': * ''Example usage'': {{{ $ matlab >> dev_scores_cleaning entropy-addr_dst sdev-addr_dst >> exit $ head -n 3 sdev-addr_dst 1107234000 -0.563939 1107234300 0.111286 1107234600 -0.360646 }}} '''[source:scripts/matlab/dev_scores.m dev_scores.m]''' * ''Usage'': dev_scores in_file out_file * ''Description'': creates an output as a timeseries with standard deviation scores to be used for detection * ''Output Format'': * ''Example usage'': ''see dev_scores_cleaning example'' == Caching Deviation Scores == Although computing deviation scores is not as computationally intensive as computing entropy, we still provide caching of deviation scores in the ''[wiki:EntityDictionary#INTERVAL_STATS INTERVAL_STATS]'' table for usage during the alarm generation phase. Updating the deviation scores is as easy as generating them with the ruby script. After [source:scripts/ruby/gen_deviation_scores.rb gen_deviation_scores.rb] is run, [source:scripts/ruby/dp_update_deviation_scores.rb dp_update_deviation_scores.rb] can read its output and cache the newly computed values in to the database. '''[source:scripts/ruby/dp_update_deviation_scores.rb dp_update_deviation_scores.rb]''' * ''Usage'': ./dp_update_deviation_scores.rb * ''Description'': a wrapper for reading deviation scores from a [source:scripts/ruby/gen_deviation_scores.rb gen_deviation_scores.rb dump] and caching them in the database. * ''Example usage'': {{{ $ ./dp_dump_entropy.rb &> /dev/null $ ./gen_deviation_scores.rb . sdev &> /dev/null $ ./dp_update_deviation_scores.rb . sdev addr_src... degree_in... ports_dst... degree_out... addr_dst... volume... fsd... ports_src... }}} == Generating Alarms == Generating alarms is most easily done after inserting the deviation scores in to the database. The deviation scores in the ''[wiki:EntityDictionary#INTERVAL_STATS INTERVAL_STATS]'' table can then be used to generate the alarms based on a threshold. The first thing to do is to create an alarm type if one does not exist in the ''[wiki:EntityDictionary#ALARMS ALARMS]'' table. Check for the alarm type: {{{ dp=> select name from alarms; name ------------- sdev3 sdev4 sdev5 wavelet3 wavelet4 wavelet5 causality trw multires20 multires60 multires100 multires300 multires600 }}} The sdev''#'' alarm types represent alarms generated when using ''#'' as the threshold such that any intervals flagged with the ''sdev3'' alarm type had a deviation score greater than 3 for a given [wiki:TrafficFeatures traffic feature]. To add an alarm type for flagging intervals as anomalous with deviation scores greater than 6: {{{ dp=> INSERT INTO alarms VALUES(nextval('alarms_seq'),'sdev6'); INSERT 0 1 }}} Now lets actually generate the alarms and insert them in to the ''[wiki:EntityDictionary#INTERVAL_STATS INTERVAL_STATS]'' table. {{{ dp=> INSERT INTO interval_alarms dp-> SELECT interval_stats.interval, interval_stats.metric, alarms.alarm_type dp-> FROM interval_stats, alarms dp-> WHERE alarms.name='sdev6' and dp-> (interval_stats.sdev_score>6 or dp-> interval_stats.sdev_score<-6); INSERT 0 43 }}} A way to view the alarms: {{{ dp=> SELECT interval, metrics.name AS metric, alarms.name AS alarm_type dp-> FROM interval_alarms, alarms, metrics dp-> WHERE alarms.name='sdev6' and dp-> interval_alarms.alarm_type=alarms.alarm_type and dp-> interval_alarms.metric=metrics.metric; interval | metric | alarm_type ---------------------+-----------+------------ 2005-02-23 16:25:00 | addr_src | sdev6 2005-02-22 11:55:00 | addr_src | sdev6 2005-02-22 10:55:00 | addr_dst | sdev6 2005-02-22 11:55:00 | ports_src | sdev6 2005-02-22 10:55:00 | ports_src | sdev6 2005-02-22 11:55:00 | ports_dst | sdev6 2005-02-11 11:10:00 | degree_in | sdev6 2005-02-10 15:25:00 | degree_in | sdev6 2005-02-22 10:55:00 | fsd | sdev6 2005-02-20 08:00:00 | volume | sdev6 2005-02-16 16:55:00 | volume | sdev6 2005-02-16 16:45:00 | volume | sdev6 ..... }}}