Wavelet Detection
Signal of network traffic for anomaly detection was first introduced by Barford et al. in A Signal Analysis of Network Traffic Anomalies. The measurement data is treated as a generic signal and is used to perform wavelet analysis, independent of the instrument used to capture it such as our Argus tool, the quantity being measured such as packets or bytes, and the actual network which the data was captured in. Once the signals are decomposed, wavelet-based algorithms can be used to filter out unwanted effects by applying a threshold to entries before reconstruction. This can filter out diurnal effects and only reconstruct the original signal with fine grained details.
Using time as the wavelets independent variable, the data captured for the metric is organized into strata, which consist of a low-frequency and high-frequency signal. The low frequency signal L(x), captures the smoothing or averaging effect, containing sparse filtered information based on the number of low-pass filtering iterations. Based on the number of these low-pass filter iterations and the measurement intervals, the low pass filter will clearly capture diurnal effects. These diurnal effects can then be filtered out using the wavelet-based algorithms, depending on the decomposition of the data. The high frequency signals Hi(x), capture the more fine grained effects such as spontaneous variations in the data. These signals would include effects such as port scans or DDoS attacks.
The first step is to normalize the high frequency and mid frequency parts to have a variance of one. A moving window is then used to compute the local variability of the high frequency and mid frequency parts, which is dependent on the maximum duration of anomalies the system is desired to capture. The window is of fixed length and determines the duration of these anomalies that can be detected, a small window losing or blurring anomalies, and a large window producing a greater amount of anomalies of little interest.
The second and third steps are combining the local variability of the high frequency and mid frequency parts and applying thresholding to this result for anomaly detection. Combining the local variability is done by using a weighted sum, which results in the V-signal. Thresholding is applied to the V-signal for anomaly detection by measuring the peak height and peak width of the signal. These features identify the anomalies, the length of the anomalies, and the relative intensity of the anomalies.
To perform signal analysis we use DB4 as our mother wavelet, a single low-pass filter L, and two high-pass filters H1 and H_2. The default window size is 30 minutes, however we also provide the immediate functionality of a 1 minute or 15 minute window with the flexibility of modifications if needed for other window sizes.
Generating Wavelet Deviation Scores
We provide MATLAB scripts for computing the standard deviation scores through the iterative process and without the iterative process on traffic which ground truth is known and there are no anomalies. These scripts need not be run on entropy data files, they can be used on any sort of aggregate metric to represent the interval. We will however provide examples and our guide using entropy data files.
Although we present the individual usage of the MATLAB scripts, a ruby wrapper (gen_deviation_scores.rb) is available which runs on Datapository entropy dumps to automatically generate all of the standard deviation scores for all metrics.
- Usage: ./gen_deviation_scores.rb <location_of_dp_entropy_dump> <sdev/wavelet/all>
- Description: a wrapper for generating deviation scores for the anomaly detection methods on a Datapository entropy dump.
- Output Files: sdev-metric
- Output Format: <epoch> <dev_score>
- Example usage:
$ ./dp_dump_entropy.rb &> /dev/null $ ./gen_deviation_scores.rb . wavelet addr_dst... addr_src... degree_in... degree_out... fsd... ports_dst... ports_src... $ ls -l wavelet-* -rw-r--r-- 1 gnychis users 112541 Mar 18 10:23 wavelet-addr_dst -rw-r--r-- 1 gnychis users 112533 Mar 18 10:23 wavelet-addr_src -rw-r--r-- 1 gnychis users 112528 Mar 18 10:23 wavelet-degree_in -rw-r--r-- 1 gnychis users 112533 Mar 18 10:23 wavelet-degree_out -rw-r--r-- 1 gnychis users 112535 Mar 18 10:23 wavelet-fsd -rw-r--r-- 1 gnychis users 112546 Mar 18 10:23 wavelet-ports_dst -rw-r--r-- 1 gnychis users 112539 Mar 18 10:23 wavelet-ports_src -rw-r--r-- 1 gnychis users 112528 Mar 18 10:23 wavelet-volume $ head -n 7 wavelet-addr_dst ... (first 5 are blank from 30 minute window) 1107235500 0.56 1107235800 0.46
- Usage: wavelet_anomaly_detection in_file out_file
- Description: creates an output as a timeseries with wavelet deviation scores to be used for detection
- Output Format: <epoch> <dev_score>
- Example usage:
$ matlab >> wavelet_anomaly_detection entropy-addr_dst sdev-addr_dst >> exit $ head -n 7 wavelet-addr_dst ... (first 5 are blank from 30 minute window) 1107235500 0.56 1107235800 0.46
Caching Deviation Scores
Although computing deviation scores is not as computationally intensive as computing entropy, we still provide caching of deviation scores in the INTERVAL_STATS table for usage during the alarm generation phase.
Updating the deviation scores is as easy as generating them with the ruby script. After gen_deviation_scores.rb is run, dp_update_deviation_scores.rb can read its output and cache the newly computed values in to the database.
- Usage: ./dp_update_deviation_scores.rb <location_of_dp_entropy_dump> <sdev/wavelet/all>
- Description: a wrapper for reading deviation scores from a gen_deviation_scores.rb dump and caching them in the database.
- Example usage:
$ ./dp_dump_entropy.rb &> /dev/null $ ./gen_deviation_scores.rb . wavelet &> /dev/null $ ./dp_update_deviation_scores.rb . wavelet addr_src... degree_in... ports_dst... degree_out... addr_dst... volume... fsd... ports_src...
Generating Alarms
Generating alarms is most easily done after inserting the deviation scores in to the database. The deviation scores in the INTERVAL_STATS table can then be used to generate the alarms based on a threshold. The first thing to do is to create an alarm type if one does not exist in the ALARMS table.
Check for the alarm type:
dp=> select name from alarms;
name
-------------
sdev3
sdev4
sdev5
wavelet3
wavelet4
wavelet5
causality
trw
multires20
multires60
multires100
multires300
multires600
The wavelet# alarm types represent alarms generated when using # as the threshold such that any intervals flagged with the wavelet3 alarm type had a deviation score greater than 3 for a given traffic feature.
To add an alarm type for flagging intervals as anomalous with deviation scores greater than 6:
dp=> INSERT INTO alarms VALUES(nextval('alarms_seq'),'wavelet6');
INSERT 0 1
Now lets actually generate the alarms and insert them in to the INTERVAL_STATS table.
dp=> INSERT INTO interval_alarms dp-> SELECT interval_stats.interval, interval_stats.metric, alarms.alarm_type dp-> FROM interval_stats, alarms dp-> WHERE alarms.name='wavelet6' and dp-> interval_stats.wavelet_score>6; INSERT 0 222
A way to view the alarms:
dp=> SELECT interval, metrics.name AS metric, alarms.name AS alarm_type
dp-> FROM interval_alarms, alarms, metrics
dp-> WHERE alarms.name='wavelet6' and
dp-> interval_alarms.alarm_type=alarms.alarm_type and
dp-> interval_alarms.metric=metrics.metric;
interval | metric | alarm_type
---------------------+------------+------------
2005-02-20 08:25:00 | addr_src | wavelet6
2005-02-20 08:20:00 | addr_src | wavelet6
2005-02-20 08:15:00 | addr_src | wavelet6
2005-02-20 08:10:00 | addr_src | wavelet6
2005-02-22 12:10:00 | ports_src | wavelet6
2005-02-24 12:30:00 | degree_in | wavelet6
...
