| | 1 | = Synthetic Attacks = |
|---|
| | 2 | |
|---|
| | 3 | Given a set of traffic to work with while developing a new anomaly detection method, while studying current anomaly detection methods, or while analyzing the [wiki:TrafficFeatures traffic features] to gain a better understanding, the given traffic set may no include all of the attacks you may want to test your detection method against or analyze the metrics during. With unlabeled data, it is not even clear if these attacks exist and what false positive and false negative rates of detecting them are. Even if the traffic set does include the attacks, what if you want to perform analysis when two occurring at the same time or if you want to increase or reduce the magnitude of the attacks? |
|---|
| | 4 | |
|---|
| | 5 | Our framework provides a means of accomplishing these types of analysis through synthetic attack generating. By generating synthetic attacks, you have complete control over the behavior of the attackers, the magnitude of the attack, and the directionality of the attack (inbound/outbound). Furthermore, any number of attacks can be generated simultaneously to better understand intrusion detection in the multidimensional space. |
|---|
| | 6 | |
|---|
| | 7 | Varying the magnitude, which is the number of attack participants, across a single interval or multiple intervals averaged together is useful for understanding at what magnitudes an anomaly detection method can detect that attack with a specific [wiki:TrafficFeatures traffic feature], or what [wiki:TrafficFeatures traffic feature] can detect the attack best. |
|---|
| | 8 | |
|---|
| | 9 | To perform synthetic attack analysis we provide a library which includes multiple types of attacks which can be introduced in to the traffic set for analysis. All of the following can be controlled in synthetic attacks: |
|---|
| | 10 | |
|---|
| | 11 | * __type of the attack__: the attack type to be used |
|---|
| | 12 | |
|---|
| | 13 | * __origin of the attack__: a single host or range of hosts (directionality) |
|---|
| | 14 | |
|---|
| | 15 | * __destination of the attack__: a single host or range of hosts (directionality) |
|---|
| | 16 | |
|---|
| | 17 | * __magnitude of the attack__: the number of attackers, bots, or worm infected hosts participating |
|---|
| | 18 | |
|---|
| | 19 | * __list of intervals__: a specific list of intervals in which the attack is placed |
|---|
| | 20 | |
|---|
| | 21 | * __random number of intervals__: generate a random, or semi-random (can specify intervals must have deviations scores of X), set of intervals to place the attack |
|---|
| | 22 | |
|---|
| | 23 | |
|---|
| | 24 | Using multiple intervals when introducing the attack can provide several benefits depending on how it is used. To understand the general behavior during attacks without introducing the bias of using a single specific interval to observe, a specific number of intervals can be specified to use for averaging. To gain an understanding of false positive and negative rates of specific attacks with an anomaly detection method, a specific number of intervals can be specified in which the attacks are randomly generated in to. The key idea is that traffic has varying characteristics during specific times of the day and different days of the week. By introducing the attacks randomly, the false positive and negative rates are not skewed towards specific times of day, and observing what intervals the anomaly detection method could or could not detect the attacks can demonstrate its strengths and weaknesses with varying background traffic. |
|---|
| | 25 | |
|---|
| | 26 | == Database Perspective == |
|---|
| | 27 | |
|---|
| | 28 | Introducing synthetic attacks can be easily and safely done using a database. The synthetic attacks can be introduced in to traffic sets for which you have no write access to and without the possibility damaging the original traffic set. This is done by creating a clone of the ''[wiki:EntityDictionary#FLOWS FLOWS]'' table, an ''ATTACK_FLOWS'' table, which all synthetic attack flows are inserted in to. A view is then created which is a union of the two tables, giving the perspective that the attack flows are within the original traffic set. All queries such as [wiki:GeneratingEntropy entropy computations] and [wiki:MetricStatistics traffic feature statistics] are done on the view which produces values under the attack. |
|---|
| | 29 | |
|---|
| | 30 | Both the attack table and view are created using methods in [source:scripts/ruby/include/sql_queries.rb sql_queries.rb] and called by our [source:scripts/ruby/dp_synthetic.rb dp_synthetic.rb] script (explained below). There is a method for creating the attack table and view, and a method for deleting the attack table and view. |
|---|
| | 31 | |
|---|
| | 32 | {{{ |
|---|
| | 33 | #----------- CREATE ATTACK TABLE --------------# |
|---|
| | 34 | def create_attack_table(table) |
|---|
| | 35 | conn = PGconn.connect(nil, nil, nil, nil, "dp") |
|---|
| | 36 | conn.exec(" |
|---|
| | 37 | CREATE TABLE attack_flows ( |
|---|
| | 38 | interval TIMESTAMP WITHOUT TIME ZONE NOT NULL, |
|---|
| | 39 | start_time TIMESTAMP WITHOUT TIME ZONE, |
|---|
| | 40 | finish_time TIMESTAMP WITHOUT TIME ZONE, |
|---|
| | 41 | protocol integer, |
|---|
| | 42 | src_ip integer, |
|---|
| | 43 | dst_ip integer, |
|---|
| | 44 | src_port integer, |
|---|
| | 45 | dst_port integer, |
|---|
| | 46 | src_packets integer, |
|---|
| | 47 | dst_packets integer, |
|---|
| | 48 | src_bytes integer, |
|---|
| | 49 | dst_bytes integer, |
|---|
| | 50 | state char(3), |
|---|
| | 51 | dir_unknown boolean, |
|---|
| | 52 | flow_id integer |
|---|
| | 53 | )") |
|---|
| | 54 | conn.exec("CREATE VIEW all_flows AS (SELECT * FROM #{table}) UNION ALL (SELECT * FROM attack_flows);") |
|---|
| | 55 | conn.close |
|---|
| | 56 | end |
|---|
| | 57 | |
|---|
| | 58 | #---------- DROP ATTACK TABLE ----------------# |
|---|
| | 59 | def delete_attack_table() |
|---|
| | 60 | conn = PGconn.connect(nil, nil, nil, nil, "dp") |
|---|
| | 61 | conn.exec("DROP TABLE attack_flows CASCADE;") |
|---|
| | 62 | conn.close |
|---|
| | 63 | end |
|---|
| | 64 | }}} |
|---|
| | 65 | |
|---|
| | 66 | Whereas we specified in the [wiki:GeneratingEntropy entropy] and [wiki:MetricStatistics traffic feature statistics] pages that the final parameter of all methods is the table name, the table name passed during synthetic attack analysis is now the view name ''all_flows'' ie. compute_all_entropy("2005-02-01 00:00:00","all_flows"). |
|---|
| | 67 | |
|---|
| | 68 | == Using Labeled Flows == |
|---|
| | 69 | |
|---|
| | 70 | Although not yet supported, the user should be able to insert real labeled anomalies from other traffic sets or intervals in to the attacks table for analysis. All the user should have to do is specify a ''label_id'' from the ''[EntityDictionary#LABELS LABELS]'' table and optionally a range or number of flows to extract from the attack. This allows the user to control the magnitude of the attack to an extent, possibly taking only half the size of a large DDoS attack or combining two labeled horizontal flows to generate twice size of an attack. |