Changes from Version 1 of WaveletDetection

Show
Ignore:
Author:
trac (IP: 127.0.0.1)
Timestamp:
06/14/07 15:50:46 (3 years ago)
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WaveletDetection

    v0 v1  
     1= Wavelet Detection = 
     2 
     3Signal of network traffic for anomaly detection was first introduced by Barford et al. in ''[http://www.cs.wisc.edu/~pb/paper_imw_02.pdf A Signal Analysis of Network Traffic Anomalies]''.  The measurement data is treated as a generic signal and is used to perform wavelet analysis, independent of the instrument used to capture it such as our Argus tool, the quantity being measured such as packets or bytes, and the actual network which the data was captured in.  Once the signals are decomposed, wavelet-based algorithms can be used to filter out unwanted effects by applying a threshold to entries before reconstruction.  This can filter out diurnal effects and only reconstruct the original signal with fine grained details.   
     4 
     5Using time as the wavelets independent variable, the data captured for the metric is organized into strata, which consist of a low-frequency and high-frequency signal.  The low frequency signal ''L(x)'', captures the smoothing or averaging effect, containing sparse filtered information based on the number of low-pass filtering iterations.  Based on the number of these low-pass filter iterations and the measurement intervals, the low pass filter will clearly capture diurnal effects.  These diurnal effects can then be filtered out using the wavelet-based algorithms, depending on the decomposition of the data. The high frequency signals ''H,,i,,(x)'', capture the more fine grained effects such as spontaneous variations in the data.  These signals would include effects such as port scans or DDoS attacks.   
     6 
     7The first step is to normalize the high frequency and mid frequency parts to have a variance of one.  A moving window is then used to compute the local variability of the high frequency and mid frequency parts, which is dependent on the maximum duration of anomalies the system is desired to capture.  The window is of fixed length and determines the duration of these anomalies that can be detected, a small window losing or blurring anomalies, and a large window producing a greater amount of anomalies of little interest.   
     8 
     9The second and third steps are combining the local variability of the high frequency and mid frequency parts and applying thresholding to this result for anomaly detection.  Combining the local variability is done by using a weighted sum, which results in the V-signal.  Thresholding is applied to the V-signal for anomaly detection by measuring the peak height and peak width of the signal.  These features identify the anomalies, the length of the anomalies, and the relative intensity of the anomalies. 
     10 
     11To perform signal analysis we use DB4 as our mother wavelet, a single low-pass filter ''L'', and two high-pass filters ''H,,1,,'' and ''H,,_,,2''. The default window size is 30 minutes, however we also provide the immediate functionality of a 1 minute or 15 minute window with the flexibility of modifications if needed for other window sizes. 
     12 
     13== Generating Wavelet Deviation Scores == 
     14 
     15 
     16We provide [http://www.mathworks.com/ MATLAB] [source:scripts/matlab scripts] for computing the standard deviation scores through the iterative process and without the iterative process on traffic which ground truth is known and there are no anomalies.  These scripts need not be run on [wiki:GeneratingEntropy#ComputingEntropy entropy data files], they can be used on any sort of aggregate metric to represent the interval.  We will however provide examples and our guide using [wiki:GeneratingEntropy#ComputingEntropy entropy data files].   
     17 
     18Although we present the individual usage of the [http://www.mathworks.com/ MATLAB] scripts, a ruby wrapper ([source:scripts/ruby/gen_deviation_scores.rb gen_deviation_scores.rb]) is available which runs on [wiki:GeneratingEntropy#EntropyScripts Datapository entropy dumps] to automatically generate all of the standard deviation scores for all metrics. 
     19 
     20'''[source:scripts/ruby/gen_deviation_scores.rb gen_deviation_scores.rb]''' 
     21   * ''Usage'': ./gen_deviation_scores.rb <location_of_dp_entropy_dump> <sdev/wavelet/all> 
     22   * ''Description'': a wrapper for generating deviation scores for the anomaly detection methods on a [wiki:GeneratingEntropy#EntropyScripts Datapository entropy dump]. 
     23   * ''Output Files'': sdev-''metric'' 
     24   * ''Output Format'':  <epoch> <dev_score> 
     25   * ''Example usage'': 
     26{{{ 
     27$ ./dp_dump_entropy.rb &> /dev/null 
     28$ ./gen_deviation_scores.rb . wavelet 
     29addr_dst... 
     30addr_src... 
     31degree_in... 
     32degree_out... 
     33fsd... 
     34ports_dst... 
     35ports_src... 
     36 
     37$ ls -l wavelet-* 
     38-rw-r--r--    1 gnychis  users      112541 Mar 18 10:23 wavelet-addr_dst 
     39-rw-r--r--    1 gnychis  users      112533 Mar 18 10:23 wavelet-addr_src 
     40-rw-r--r--    1 gnychis  users      112528 Mar 18 10:23 wavelet-degree_in 
     41-rw-r--r--    1 gnychis  users      112533 Mar 18 10:23 wavelet-degree_out 
     42-rw-r--r--    1 gnychis  users      112535 Mar 18 10:23 wavelet-fsd 
     43-rw-r--r--    1 gnychis  users      112546 Mar 18 10:23 wavelet-ports_dst 
     44-rw-r--r--    1 gnychis  users      112539 Mar 18 10:23 wavelet-ports_src 
     45-rw-r--r--    1 gnychis  users      112528 Mar 18 10:23 wavelet-volume 
     46 
     47$ head -n 7 wavelet-addr_dst  
     48... (first 5 are blank from 30 minute window) 
     491107235500 0.56 
     501107235800 0.46 
     51}}} 
     52 
     53'''[source:scripts/matlab/wavelet_anomaly_detection.m wavelet_anomaly_detection.m]''' 
     54   * ''Usage'': wavelet_anomaly_detection in_file out_file 
     55   * ''Description'': creates an output as a timeseries with wavelet deviation scores to be used for detection 
     56   * ''Output Format'':  <epoch> <dev_score> 
     57   * ''Example usage'': 
     58   {{{ 
     59 
     60$ matlab 
     61>> wavelet_anomaly_detection entropy-addr_dst sdev-addr_dst 
     62>> exit 
     63 
     64$ head -n 7 wavelet-addr_dst  
     65... (first 5 are blank from 30 minute window) 
     661107235500 0.56 
     671107235800 0.46 
     68   }}} 
     69 
     70== Caching Deviation Scores == 
     71 
     72Although computing deviation scores is not as computationally intensive as computing entropy, we still provide caching of deviation scores in the ''[wiki:EntityDictionary#INTERVAL_STATS INTERVAL_STATS]'' table for usage during the alarm generation phase. 
     73 
     74Updating the deviation scores is as easy as generating them with the ruby script.  After [source:scripts/ruby/gen_deviation_scores.rb gen_deviation_scores.rb] is run, [source:scripts/ruby/dp_update_deviation_scores.rb dp_update_deviation_scores.rb] can read its output and cache the newly computed values in to the database. 
     75 
     76'''[source:scripts/ruby/dp_update_deviation_scores.rb dp_update_deviation_scores.rb]''' 
     77   * ''Usage'': ./dp_update_deviation_scores.rb <location_of_dp_entropy_dump> <sdev/wavelet/all> 
     78   * ''Description'': a wrapper for reading deviation scores from a [source:scripts/ruby/gen_deviation_scores.rb gen_deviation_scores.rb dump] and caching them in the database. 
     79   * ''Example usage'': 
     80{{{ 
     81$ ./dp_dump_entropy.rb &> /dev/null 
     82$ ./gen_deviation_scores.rb . wavelet &> /dev/null 
     83$ ./dp_update_deviation_scores.rb . wavelet 
     84addr_src... 
     85degree_in... 
     86ports_dst... 
     87degree_out... 
     88addr_dst... 
     89volume... 
     90fsd... 
     91ports_src... 
     92}}} 
     93 
     94== Generating Alarms == 
     95 
     96Generating alarms is most easily done after inserting the deviation scores in to the database.  The deviation scores in the ''[wiki:EntityDictionary#INTERVAL_STATS INTERVAL_STATS]'' table can then be used to generate the alarms based on a threshold.  The first thing to do is to create an alarm type if one does not exist in the ''[wiki:EntityDictionary#ALARMS ALARMS]'' table. 
     97 
     98Check for the alarm type: 
     99{{{ 
     100dp=> select name from alarms; 
     101    name      
     102------------- 
     103 sdev3 
     104 sdev4 
     105 sdev5 
     106 wavelet3 
     107 wavelet4 
     108 wavelet5 
     109 causality 
     110 trw 
     111 multires20 
     112 multires60 
     113 multires100 
     114 multires300 
     115 multires600 
     116}}} 
     117 
     118The wavelet''#'' alarm types represent alarms generated when using ''#'' as the threshold such that any intervals flagged with the ''wavelet3'' alarm type had a deviation score greater than 3 for a given [wiki:TrafficFeatures traffic feature]. 
     119 
     120To add an alarm type for flagging intervals as anomalous with deviation scores greater than 6: 
     121{{{ 
     122dp=> INSERT INTO alarms VALUES(nextval('alarms_seq'),'wavelet6'); 
     123INSERT 0 1 
     124}}} 
     125 
     126Now lets actually generate the alarms and insert them in to the ''[wiki:EntityDictionary#INTERVAL_STATS INTERVAL_STATS]'' table. 
     127{{{ 
     128dp=> INSERT INTO interval_alarms 
     129dp-> SELECT interval_stats.interval, interval_stats.metric, alarms.alarm_type 
     130dp-> FROM interval_stats, alarms 
     131dp-> WHERE alarms.name='wavelet6' and 
     132dp->       interval_stats.wavelet_score>6; 
     133INSERT 0 222 
     134}}} 
     135 
     136A way to view the alarms: 
     137{{{ 
     138dp=> SELECT interval, metrics.name AS metric, alarms.name AS alarm_type 
     139dp-> FROM interval_alarms, alarms, metrics 
     140dp-> WHERE alarms.name='wavelet6' and    
     141dp->        interval_alarms.alarm_type=alarms.alarm_type and 
     142dp->        interval_alarms.metric=metrics.metric; 
     143 
     144      interval       |   metric   | alarm_type  
     145---------------------+------------+------------ 
     146 2005-02-20 08:25:00 | addr_src   | wavelet6 
     147 2005-02-20 08:20:00 | addr_src   | wavelet6 
     148 2005-02-20 08:15:00 | addr_src   | wavelet6 
     149 2005-02-20 08:10:00 | addr_src   | wavelet6 
     150 2005-02-22 12:10:00 | ports_src  | wavelet6 
     151 2005-02-24 12:30:00 | degree_in  | wavelet6 
     152 ... 
     153}}}