Datapository Methods
The following is a list of available methods in the toolkit for use in development of new scripts, grouped by functionality.
Quick Links:
- Core Methods
- Traffic Statistics
- Entropy
- Anomaly Detection
- Labeling Anomalies
- Synthetic Attack Table
- Synthetic Attacks
Core Methods
get_metrics()
- Description: returns all of the available metrics
- Return type: array, at each index is an integer representation of the metric id, and the string name of the metric
interval_to_filename(interval)
- Return type: string formatted filename
epoch_to_filename(epoch)
- Return type: string formatted filename
epoch_to_timestamp(epoch)
- Return type: timestamp interval
- NOTE: epoch must be in INTERVAL_STATS
timestamp_to_epoch(timestamp)
- Return type: integer epoch
- NOTE: timestamp must be in INTERVAL_STATS
get_all_intervals()
- Return type: array, each element consisting of a string formatted interval and epoch
- Ex. Usage: get_all_intervals().each {|interval,epoch| ...}
get_intervals_check(type, low_score, high_score)
- Description: get all intervals from INTERVALS where the attribute named type has a value between low_score and high_score
- Return type: array, each element is a timestamp formatted interval
get_random_intervals(num_intervals)
- Description: get num_intervals random intervals from INTERVALS
- Return type: array, each element is a timestamp formatted interval
get_random_intervals_check(type, low_score, high_score, num_intervals)
- Description: get num_intervals random intervals from INTERVALS where the attribute named type has a value between low_score and high_score
- Return type: array, each element is a timestamp formatted interval
get_flows_partitions()
- Return type: an array consisting of all of the FLOWS table partitions
- Ex. Usage: get_all_intervals().each {|interval,epoch| ...}
get_editor_input()
- Description: open up the user's favorite $EDITOR and get their input
- Return type: the user's input in string format, nil on error
correlate( filename1, filename2, label1, label2, outfile )
- Description: generates correlation scores between the second columns of data in filename1 and filename2
- Output file: outfile
- Output format: <label1> <label2> <correlation_score>
Traffic Statistics
stats_addr_degree_in(timestamp,table)
- Return type: an array result indexed by host address with values being the associated in degree
stats_addr_degree_out(timestamp,table)
- Return type: an array result indexed by host address with values being the associated out degree
stats_degree_in(timestamp,table)
- Return type: an array result indexed by in degree with values being the number of hosts with the indexed degree
stats_degree_out(timestamp,table)
- Return type: an array result indexed by out degree with values being the number of hosts with the indexed degree
stats_addr_src(timestamp,table)
- Return type: an array result indexed by host address with values being the associated source packets
stats_addr_dst(timestamp,table)
- Return type: an array result indexed by host address with values being the associated destination packets
stats_ports_src(timestamp,table)
- Return type: an array result indexed by port with values being the associated source packets
stats_ports_dst(timestamp,table)
- Return type: an array result indexed by port with values being the associated destination packets
stats_fsd(timestamp,table)
- Return type: an array result indexed by flow size distribution with values being the number of flows with the given flow size distribution
Entropy
compute_all_entropy(interval, table)
- Description: compute all of the entropy values for all traffic features
- Return type: hash indexed by traffic feature, value at the index is the entropy as a float
cache_all_entropy(interval,entropy)
- Description: takes an interval and a hash indexed by traffic feature (like what compute_all_entropy() returns) and caches the values in INTERVAL_STATS
- Return type: none
read_all_entropy(interval)
- Description: reads the cached entropy values for all traffic features from INTERVAL_STATS
- Return type: hash indexed by traffic feature, value at the index is the entropy as a float
degree_entropy(data)
- Description: takes an array of degree data returned from stats_addr_degree_in() or stats_addr_degree_out() and computes the entropy
- Return type: entropy as a float
general_entropy(data)
- Description: takes an array of data return from stats_ports_*(), stats_addr_*() and stats_fsd() and computes the entropy
- Return type: entropy as a float
entropy_degree_in(interval,table)
- Description: computes the entropy for interval in table by calling stats_addr_degree_in() and passing the data to degree_entropy(data)
- Return type: entropy as a float
entropy_degree_out(interval,table)
- Description: computes the entropy for interval in table by calling stats_addr_degree_out() and passing the data to degree_entropy(data)
- Return type: entropy as a float
entropy_ports_src(interval,table)
- Description: computes the entropy for interval in table by calling stats_ports_src() and passing the data to general_entropy(data)
- Return type: entropy as a float
entropy_ports_dst(interval,table)
- Description: computes the entropy for interval in table by calling stats_ports_dst() and passing the data to general_entropy(data)
- Return type: entropy as a float
entropy_addr_src(interval,table)
- Description: computes the entropy for interval in table by calling stats_addr_src() and passing the data to general_entropy(data)
- Return type: entropy as a float
entropy_addr_dst(interval,table)
- Description: computes the entropy for interval in table by calling stats_addr_dst() and passing the data to general_entropy(data)
- Return type: entropy as a float
entropy_fsd(interval,table)
- Description: computes the entropy for interval in table by calling stats_fsd() and passing the data to general_entropy(data)
- Return type: entropy as a float
print_magnitude_entropy(magnitude, entropy)
- Description: takes a magnitude value and entropy hash to be printed to stdout for use with synthetic attacks
- Output format: <magnitude> <degree_in> <degree_out> <ports_src> <ports_dst> <addr_src> <addr_dst> <fsd>
- Return type: none
Anomaly Detection (MATLAB)
dev_scores(filename, outfile)
- Description: generates standard deviation scores for the second column of data in filename
- Output file: outfile
- Output format: <column_1_from_filename> <standard_deviation_score>
dev_scores_cleaning(filename,outfile)
- Description: generates standard deviation scores for the second column of data in filename after performing an iterative cleaning process described in the standard deviation section
- Output file: outfile
- Output format: <column_1_from_filename> <standard_deviation_score>
wavelet_anomaly_detection(filename,outfile)
- Description: generates wavelet based deviation scores with a window of 30 minutes from the second column of data in filename
- Output file outfile
- Output format: <column_1_from_filename> <wavelet_deviation_score>
get_full_alarms(alarm_name,metric_name)
- Description: for all intervals, return a timestamp and 0 or 1 depending on whether or not an alarm is raised for alarm_name with metric_name
- Return type: array, at each index is |interval,flag| where flag is 0 or 1
get_full_alarms_epoch(alarm_name,metric_name)
- Description: for all intervals, return an epoch and 0 or 1 depending on whether or not an alarm is raised for alarm_name with metric_name
- Return type: array, at each index is |epoch,flag| where flag is 0 or 1
get_alarms(alarm_name,metric_name)
- Description: returns intervals where an alarm is raised for alarm_name with metric_name
- Return type: array of timestamps
get_alarms_epoch(alarm_name,metric_name)
- Description: returns epochs where an alarm is raised for alarm_name with metric_name
- Return type: array of integers
Labeling Anomalies
get_anomaly_types()
- Description: retrieves all of the current anomaly types as an array with type and description
- Return type: array
create_anomaly_type(description)
- Description: inserts a new anomaly type in to ANOMALIES with the description provided
- Return type: integer which represents the unique type attribute associated with the newly inserted anomaly type
get_attack_flow_list()
- Description: queries the user for a filename which includes a list of flows associated with an attack, then checks the file that it is in proper format: <interval>,<flow_id>
- Return type: the filename as a string if the file passed the format check and has at least one flow, nil is returned if it fails
get_user_anomaly_type()
- Description: displays ANOMALIES in table format and asks the user to select one, or -1 if they want to create a new type
- Return type: integer on success, nil on failure
insert_labeled_flows(filename)
- Description: inserts into LABELED_FLOWS from filename, format of the file must be: <label_id>,<interval>,<flow_id>
- Return type: none
insert_label(anomaly_type, description)
- Description: inserts into LABELES with the anomaly_type and provided description, the label_id is generated from a sequence number
- Return type: the label_id assigned to the label as an integer
get_all_labels()
- Description: retrieves all of the labels in the LABELES
- Return type: array, each array index has an array with: the label_id, anomaly description, label description, interval count, and flow count
get_label(label_id)
- Description: retrieves the specified label in the LABELES
- Return type: array with the following indexes: has the label_id, anomaly description, label description, interval count, and flow count
print_labels()
- Description: prints out in a human readable format information about a label. Note that it can be used with a single label, ie. print_labels(get_label(1))
- Return type: none
get_all_labeled_flows()
- Desription: gets all of the flows from the LABELED_FLOWS table, including the associated label with the flows
- Return type: array, at each index is an array with values label_id, interval, and flow_id
get_labeled_flows(labels)
- Description: specify a label and get all of the flows associated with the label
- Return type: array, at each index is an array with values interval and flow_id
get_all_full_labeled_flows()
- Description: gets all of the FULL flow level information from the FLOWS associated with all labels, essentially returning all of the attack flows for all labels
- Return type: array, at each index is a label_id and a full record from FLOWS which is a flow associated with the label
get_full_labeled_flows(label)
- Description: gets all of the FULL flow level information from the FLOWS associated with the given label, essentially returning all of the attack flows for the given label
- Return type: array, at each index is a full record from FLOWS which is a flow associated with the label
search_labels_and(words)
- Description: search through all of the label descriptions for a label that contains all of the words in the string words, unordered
- Note: this is not case sensitive
- Return type: array of label_ids as integers
search_labels_or(words)
- Description: search through all of the label descriptions for a label that contains any of the words in the string words, unordered
- Note: this is not case sensitive
- Return type: array of label_ids as integers
Synthetic Attack Table
create_attack_table(table)
- Description: creates the attack table and view which is a union of the attack table and the table parameter
- Return type: none
attack_flows_timestamp(timestamp)
- Description: change all of the intervals of the flows in the attack table to timestamp
- Return type: none
clear_attack_table()
- Description: removes all of the attack flows from all intervals
- Return type: none
drop_attack_table()
- Description: deletes the attack table and view
- Return type: none
Synthetic Attacks
insert_ib_worm(interval, infected_hosts, subnet_start, subnet_end, scan_rate, scan_port)
- Description: insert worm activity on scan_port in interval sourced from infected_hosts random Internet addresses to random intranet addresses between subnet_start and subnet_end where each infected scanning host scans at a rate of scan_rate in hosts per second
- Return type: none
insert_ob_worm(interval, infected_hosts, subnet_start, subnet_end, scan_rate, scan_port)
- Description: insert worm activity onscan_port in interval sourced from infected_hosts random intranet addresses between subnet_start and subnet_end to random Internet addresses where each infected scanning host scans at a rate of scan_rate in hosts per second
- Return type: none
insert_ib_hscan(interval, start_victim, scan_rate, scan_port, scanner)
- Description: insert an inbound horizontal scan in interval from scanner on scan_port at a rate of scan_rate in hosts per second, sequentially attacking destination host addresses starting at start_victim
- Return type: none
insert_ob_hscan(interval, start_victim, scan_rate, scan_port, scanner)
- Description: insert an outbound horizontal scan in interval from scanner on scan_port at a rate of scan_rate in hosts per second, sequentially attacking destination host addresses starting at start_victim
- Return type: none
insert_ib_vscan(interval, start_victim, scan_rate, scanner)
- Description: insert an inbound horizontal scan in interval from scanner on ports 0-65535 at a rate of scan_rate in hosts per second, sequentially attacking destination host addresses starting at start_victim
- Return type: none
insert_ob_vscan(interval, start_victim, scan_rate, scanner)
- Description: insert an outbound vertical scan in interval from scanner on ports 0-65535 at a rate of scan_rate in hosts per second, sequentially attacking destination host addresses starting at start_victim
- Return type: none
insert_ib_flood(interval, num_attackers, attack_rate_low, attack_rate_high, victim)
- Description: insert num_attackers inbound attack flows at an attack_rate between attack_rate_low and attack_rate_high in KB/s against victim into interval
- Return type: none
insert_ob_flood(interval, num_attackers, subnet_start, subnet_end, attack_rate_low, attack_rate_high, victim)
- Description: insert num_attackers inbound attack flows at an attack_rate between attack_rate_low and attack_rate_high in KB/s against victim into interval, where the attackers are randomly chosen between subnet_start and subnet_end
- Return type: none
