| | 1 | = Wavelet Detection = |
|---|
| | 2 | |
|---|
| | 3 | Signal of network traffic for anomaly detection was first introduced by Barford et al. in ''[http://www.cs.wisc.edu/~pb/paper_imw_02.pdf A Signal Analysis of Network Traffic Anomalies]''. The measurement data is treated as a generic signal and is used to perform wavelet analysis, independent of the instrument used to capture it such as our Argus tool, the quantity being measured such as packets or bytes, and the actual network which the data was captured in. Once the signals are decomposed, wavelet-based algorithms can be used to filter out unwanted effects by applying a threshold to entries before reconstruction. This can filter out diurnal effects and only reconstruct the original signal with fine grained details. |
|---|
| | 4 | |
|---|
| | 5 | Using time as the wavelets independent variable, the data captured for the metric is organized into strata, which consist of a low-frequency and high-frequency signal. The low frequency signal ''L(x)'', captures the smoothing or averaging effect, containing sparse filtered information based on the number of low-pass filtering iterations. Based on the number of these low-pass filter iterations and the measurement intervals, the low pass filter will clearly capture diurnal effects. These diurnal effects can then be filtered out using the wavelet-based algorithms, depending on the decomposition of the data. The high frequency signals ''H,,i,,(x)'', capture the more fine grained effects such as spontaneous variations in the data. These signals would include effects such as port scans or DDoS attacks. |
|---|
| | 6 | |
|---|
| | 7 | The first step is to normalize the high frequency and mid frequency parts to have a variance of one. A moving window is then used to compute the local variability of the high frequency and mid frequency parts, which is dependent on the maximum duration of anomalies the system is desired to capture. The window is of fixed length and determines the duration of these anomalies that can be detected, a small window losing or blurring anomalies, and a large window producing a greater amount of anomalies of little interest. |
|---|
| | 8 | |
|---|
| | 9 | The second and third steps are combining the local variability of the high frequency and mid frequency parts and applying thresholding to this result for anomaly detection. Combining the local variability is done by using a weighted sum, which results in the V-signal. Thresholding is applied to the V-signal for anomaly detection by measuring the peak height and peak width of the signal. These features identify the anomalies, the length of the anomalies, and the relative intensity of the anomalies. |
|---|
| | 10 | |
|---|
| | 11 | To perform signal analysis we use DB4 as our mother wavelet, a single low-pass filter ''L'', and two high-pass filters ''H,,1,,'' and ''H,,_,,2''. The default window size is 30 minutes, however we also provide the immediate functionality of a 1 minute or 15 minute window with the flexibility of modifications if needed for other window sizes. |
|---|
| | 12 | |
|---|
| | 13 | == Generating Wavelet Deviation Scores == |
|---|
| | 14 | |
|---|
| | 15 | |
|---|
| | 16 | We provide [http://www.mathworks.com/ MATLAB] [source:scripts/matlab scripts] for computing the standard deviation scores through the iterative process and without the iterative process on traffic which ground truth is known and there are no anomalies. These scripts need not be run on [wiki:GeneratingEntropy#ComputingEntropy entropy data files], they can be used on any sort of aggregate metric to represent the interval. We will however provide examples and our guide using [wiki:GeneratingEntropy#ComputingEntropy entropy data files]. |
|---|
| | 17 | |
|---|
| | 18 | Although we present the individual usage of the [http://www.mathworks.com/ MATLAB] scripts, a ruby wrapper ([source:scripts/ruby/gen_deviation_scores.rb gen_deviation_scores.rb]) is available which runs on [wiki:GeneratingEntropy#EntropyScripts Datapository entropy dumps] to automatically generate all of the standard deviation scores for all metrics. |
|---|
| | 19 | |
|---|
| | 20 | '''[source:scripts/ruby/gen_deviation_scores.rb gen_deviation_scores.rb]''' |
|---|
| | 21 | * ''Usage'': ./gen_deviation_scores.rb <location_of_dp_entropy_dump> <sdev/wavelet/all> |
|---|
| | 22 | * ''Description'': a wrapper for generating deviation scores for the anomaly detection methods on a [wiki:GeneratingEntropy#EntropyScripts Datapository entropy dump]. |
|---|
| | 23 | * ''Output Files'': sdev-''metric'' |
|---|
| | 24 | * ''Output Format'': <epoch> <dev_score> |
|---|
| | 25 | * ''Example usage'': |
|---|
| | 26 | {{{ |
|---|
| | 27 | $ ./dp_dump_entropy.rb &> /dev/null |
|---|
| | 28 | $ ./gen_deviation_scores.rb . wavelet |
|---|
| | 29 | addr_dst... |
|---|
| | 30 | addr_src... |
|---|
| | 31 | degree_in... |
|---|
| | 32 | degree_out... |
|---|
| | 33 | fsd... |
|---|
| | 34 | ports_dst... |
|---|
| | 35 | ports_src... |
|---|
| | 36 | |
|---|
| | 37 | $ ls -l wavelet-* |
|---|
| | 38 | -rw-r--r-- 1 gnychis users 112541 Mar 18 10:23 wavelet-addr_dst |
|---|
| | 39 | -rw-r--r-- 1 gnychis users 112533 Mar 18 10:23 wavelet-addr_src |
|---|
| | 40 | -rw-r--r-- 1 gnychis users 112528 Mar 18 10:23 wavelet-degree_in |
|---|
| | 41 | -rw-r--r-- 1 gnychis users 112533 Mar 18 10:23 wavelet-degree_out |
|---|
| | 42 | -rw-r--r-- 1 gnychis users 112535 Mar 18 10:23 wavelet-fsd |
|---|
| | 43 | -rw-r--r-- 1 gnychis users 112546 Mar 18 10:23 wavelet-ports_dst |
|---|
| | 44 | -rw-r--r-- 1 gnychis users 112539 Mar 18 10:23 wavelet-ports_src |
|---|
| | 45 | -rw-r--r-- 1 gnychis users 112528 Mar 18 10:23 wavelet-volume |
|---|
| | 46 | |
|---|
| | 47 | $ head -n 7 wavelet-addr_dst |
|---|
| | 48 | ... (first 5 are blank from 30 minute window) |
|---|
| | 49 | 1107235500 0.56 |
|---|
| | 50 | 1107235800 0.46 |
|---|
| | 51 | }}} |
|---|
| | 52 | |
|---|
| | 53 | '''[source:scripts/matlab/wavelet_anomaly_detection.m wavelet_anomaly_detection.m]''' |
|---|
| | 54 | * ''Usage'': wavelet_anomaly_detection in_file out_file |
|---|
| | 55 | * ''Description'': creates an output as a timeseries with wavelet deviation scores to be used for detection |
|---|
| | 56 | * ''Output Format'': <epoch> <dev_score> |
|---|
| | 57 | * ''Example usage'': |
|---|
| | 58 | {{{ |
|---|
| | 59 | |
|---|
| | 60 | $ matlab |
|---|
| | 61 | >> wavelet_anomaly_detection entropy-addr_dst sdev-addr_dst |
|---|
| | 62 | >> exit |
|---|
| | 63 | |
|---|
| | 64 | $ head -n 7 wavelet-addr_dst |
|---|
| | 65 | ... (first 5 are blank from 30 minute window) |
|---|
| | 66 | 1107235500 0.56 |
|---|
| | 67 | 1107235800 0.46 |
|---|
| | 68 | }}} |
|---|
| | 69 | |
|---|
| | 70 | == Caching Deviation Scores == |
|---|
| | 71 | |
|---|
| | 72 | Although computing deviation scores is not as computationally intensive as computing entropy, we still provide caching of deviation scores in the ''[wiki:EntityDictionary#INTERVAL_STATS INTERVAL_STATS]'' table for usage during the alarm generation phase. |
|---|
| | 73 | |
|---|
| | 74 | Updating the deviation scores is as easy as generating them with the ruby script. After [source:scripts/ruby/gen_deviation_scores.rb gen_deviation_scores.rb] is run, [source:scripts/ruby/dp_update_deviation_scores.rb dp_update_deviation_scores.rb] can read its output and cache the newly computed values in to the database. |
|---|
| | 75 | |
|---|
| | 76 | '''[source:scripts/ruby/dp_update_deviation_scores.rb dp_update_deviation_scores.rb]''' |
|---|
| | 77 | * ''Usage'': ./dp_update_deviation_scores.rb <location_of_dp_entropy_dump> <sdev/wavelet/all> |
|---|
| | 78 | * ''Description'': a wrapper for reading deviation scores from a [source:scripts/ruby/gen_deviation_scores.rb gen_deviation_scores.rb dump] and caching them in the database. |
|---|
| | 79 | * ''Example usage'': |
|---|
| | 80 | {{{ |
|---|
| | 81 | $ ./dp_dump_entropy.rb &> /dev/null |
|---|
| | 82 | $ ./gen_deviation_scores.rb . wavelet &> /dev/null |
|---|
| | 83 | $ ./dp_update_deviation_scores.rb . wavelet |
|---|
| | 84 | addr_src... |
|---|
| | 85 | degree_in... |
|---|
| | 86 | ports_dst... |
|---|
| | 87 | degree_out... |
|---|
| | 88 | addr_dst... |
|---|
| | 89 | volume... |
|---|
| | 90 | fsd... |
|---|
| | 91 | ports_src... |
|---|
| | 92 | }}} |
|---|
| | 93 | |
|---|
| | 94 | == Generating Alarms == |
|---|
| | 95 | |
|---|
| | 96 | Generating alarms is most easily done after inserting the deviation scores in to the database. The deviation scores in the ''[wiki:EntityDictionary#INTERVAL_STATS INTERVAL_STATS]'' table can then be used to generate the alarms based on a threshold. The first thing to do is to create an alarm type if one does not exist in the ''[wiki:EntityDictionary#ALARMS ALARMS]'' table. |
|---|
| | 97 | |
|---|
| | 98 | Check for the alarm type: |
|---|
| | 99 | {{{ |
|---|
| | 100 | dp=> select name from alarms; |
|---|
| | 101 | name |
|---|
| | 102 | ------------- |
|---|
| | 103 | sdev3 |
|---|
| | 104 | sdev4 |
|---|
| | 105 | sdev5 |
|---|
| | 106 | wavelet3 |
|---|
| | 107 | wavelet4 |
|---|
| | 108 | wavelet5 |
|---|
| | 109 | causality |
|---|
| | 110 | trw |
|---|
| | 111 | multires20 |
|---|
| | 112 | multires60 |
|---|
| | 113 | multires100 |
|---|
| | 114 | multires300 |
|---|
| | 115 | multires600 |
|---|
| | 116 | }}} |
|---|
| | 117 | |
|---|
| | 118 | The wavelet''#'' alarm types represent alarms generated when using ''#'' as the threshold such that any intervals flagged with the ''wavelet3'' alarm type had a deviation score greater than 3 for a given [wiki:TrafficFeatures traffic feature]. |
|---|
| | 119 | |
|---|
| | 120 | To add an alarm type for flagging intervals as anomalous with deviation scores greater than 6: |
|---|
| | 121 | {{{ |
|---|
| | 122 | dp=> INSERT INTO alarms VALUES(nextval('alarms_seq'),'wavelet6'); |
|---|
| | 123 | INSERT 0 1 |
|---|
| | 124 | }}} |
|---|
| | 125 | |
|---|
| | 126 | Now lets actually generate the alarms and insert them in to the ''[wiki:EntityDictionary#INTERVAL_STATS INTERVAL_STATS]'' table. |
|---|
| | 127 | {{{ |
|---|
| | 128 | dp=> INSERT INTO interval_alarms |
|---|
| | 129 | dp-> SELECT interval_stats.interval, interval_stats.metric, alarms.alarm_type |
|---|
| | 130 | dp-> FROM interval_stats, alarms |
|---|
| | 131 | dp-> WHERE alarms.name='wavelet6' and |
|---|
| | 132 | dp-> interval_stats.wavelet_score>6; |
|---|
| | 133 | INSERT 0 222 |
|---|
| | 134 | }}} |
|---|
| | 135 | |
|---|
| | 136 | A way to view the alarms: |
|---|
| | 137 | {{{ |
|---|
| | 138 | dp=> SELECT interval, metrics.name AS metric, alarms.name AS alarm_type |
|---|
| | 139 | dp-> FROM interval_alarms, alarms, metrics |
|---|
| | 140 | dp-> WHERE alarms.name='wavelet6' and |
|---|
| | 141 | dp-> interval_alarms.alarm_type=alarms.alarm_type and |
|---|
| | 142 | dp-> interval_alarms.metric=metrics.metric; |
|---|
| | 143 | |
|---|
| | 144 | interval | metric | alarm_type |
|---|
| | 145 | ---------------------+------------+------------ |
|---|
| | 146 | 2005-02-20 08:25:00 | addr_src | wavelet6 |
|---|
| | 147 | 2005-02-20 08:20:00 | addr_src | wavelet6 |
|---|
| | 148 | 2005-02-20 08:15:00 | addr_src | wavelet6 |
|---|
| | 149 | 2005-02-20 08:10:00 | addr_src | wavelet6 |
|---|
| | 150 | 2005-02-22 12:10:00 | ports_src | wavelet6 |
|---|
| | 151 | 2005-02-24 12:30:00 | degree_in | wavelet6 |
|---|
| | 152 | ... |
|---|
| | 153 | }}} |