Datapository Methods

The following is a list of available methods in the toolkit for use in development of new scripts, grouped by functionality.

Quick Links:

Core Methods

get_metrics()

  • Description: returns all of the available metrics
  • Return type: array, at each index is an integer representation of the metric id, and the string name of the metric

interval_to_filename(interval)

  • Return type: string formatted filename

epoch_to_filename(epoch)

  • Return type: string formatted filename

epoch_to_timestamp(epoch)

  • Return type: timestamp interval
  • NOTE: epoch must be in INTERVAL_STATS

timestamp_to_epoch(timestamp)

get_all_intervals()

  • Return type: array, each element consisting of a string formatted interval and epoch
  • Ex. Usage: get_all_intervals().each {|interval,epoch| ...}

get_intervals_check(type, low_score, high_score)

  • Description: get all intervals from INTERVALS where the attribute named type has a value between low_score and high_score
  • Return type: array, each element is a timestamp formatted interval

get_random_intervals(num_intervals)

  • Description: get num_intervals random intervals from INTERVALS
  • Return type: array, each element is a timestamp formatted interval

get_random_intervals_check(type, low_score, high_score, num_intervals)

  • Description: get num_intervals random intervals from INTERVALS where the attribute named type has a value between low_score and high_score
  • Return type: array, each element is a timestamp formatted interval

get_flows_partitions()

  • Return type: an array consisting of all of the FLOWS table partitions
  • Ex. Usage: get_all_intervals().each {|interval,epoch| ...}

get_editor_input()

  • Description: open up the user's favorite $EDITOR and get their input
  • Return type: the user's input in string format, nil on error

correlate( filename1, filename2, label1, label2, outfile )

  • Description: generates correlation scores between the second columns of data in filename1 and filename2
  • Output file: outfile
  • Output format: <label1> <label2> <correlation_score>

Traffic Statistics

stats_addr_degree_in(timestamp,table)

  • Return type: an array result indexed by host address with values being the associated in degree

stats_addr_degree_out(timestamp,table)

  • Return type: an array result indexed by host address with values being the associated out degree

stats_degree_in(timestamp,table)

  • Return type: an array result indexed by in degree with values being the number of hosts with the indexed degree

stats_degree_out(timestamp,table)

  • Return type: an array result indexed by out degree with values being the number of hosts with the indexed degree

stats_addr_src(timestamp,table)

  • Return type: an array result indexed by host address with values being the associated source packets

stats_addr_dst(timestamp,table)

  • Return type: an array result indexed by host address with values being the associated destination packets

stats_ports_src(timestamp,table)

  • Return type: an array result indexed by port with values being the associated source packets

stats_ports_dst(timestamp,table)

  • Return type: an array result indexed by port with values being the associated destination packets

stats_fsd(timestamp,table)

  • Return type: an array result indexed by flow size distribution with values being the number of flows with the given flow size distribution

Entropy

compute_all_entropy(interval, table)

  • Description: compute all of the entropy values for all traffic features
  • Return type: hash indexed by traffic feature, value at the index is the entropy as a float

cache_all_entropy(interval,entropy)

  • Description: takes an interval and a hash indexed by traffic feature (like what compute_all_entropy() returns) and caches the values in INTERVAL_STATS
  • Return type: none

read_all_entropy(interval)

degree_entropy(data)

  • Description: takes an array of degree data returned from stats_addr_degree_in() or stats_addr_degree_out() and computes the entropy
  • Return type: entropy as a float

general_entropy(data)

  • Description: takes an array of data return from stats_ports_*(), stats_addr_*() and stats_fsd() and computes the entropy
  • Return type: entropy as a float

entropy_degree_in(interval,table)

  • Description: computes the entropy for interval in table by calling stats_addr_degree_in() and passing the data to degree_entropy(data)
  • Return type: entropy as a float

entropy_degree_out(interval,table)

  • Description: computes the entropy for interval in table by calling stats_addr_degree_out() and passing the data to degree_entropy(data)
  • Return type: entropy as a float

entropy_ports_src(interval,table)

  • Description: computes the entropy for interval in table by calling stats_ports_src() and passing the data to general_entropy(data)
  • Return type: entropy as a float

entropy_ports_dst(interval,table)

  • Description: computes the entropy for interval in table by calling stats_ports_dst() and passing the data to general_entropy(data)
  • Return type: entropy as a float

entropy_addr_src(interval,table)

  • Description: computes the entropy for interval in table by calling stats_addr_src() and passing the data to general_entropy(data)
  • Return type: entropy as a float

entropy_addr_dst(interval,table)

  • Description: computes the entropy for interval in table by calling stats_addr_dst() and passing the data to general_entropy(data)
  • Return type: entropy as a float

entropy_fsd(interval,table)

  • Description: computes the entropy for interval in table by calling stats_fsd() and passing the data to general_entropy(data)
  • Return type: entropy as a float

print_magnitude_entropy(magnitude, entropy)

  • Description: takes a magnitude value and entropy hash to be printed to stdout for use with synthetic attacks
  • Output format: <magnitude> <degree_in> <degree_out> <ports_src> <ports_dst> <addr_src> <addr_dst> <fsd>
  • Return type: none

Anomaly Detection (MATLAB)

dev_scores(filename, outfile)

  • Description: generates standard deviation scores for the second column of data in filename
  • Output file: outfile
  • Output format: <column_1_from_filename> <standard_deviation_score>

dev_scores_cleaning(filename,outfile)

  • Description: generates standard deviation scores for the second column of data in filename after performing an iterative cleaning process described in the standard deviation section
  • Output file: outfile
  • Output format: <column_1_from_filename> <standard_deviation_score>

wavelet_anomaly_detection(filename,outfile)

  • Description: generates wavelet based deviation scores with a window of 30 minutes from the second column of data in filename
  • Output file outfile
  • Output format: <column_1_from_filename> <wavelet_deviation_score>

get_full_alarms(alarm_name,metric_name)

  • Description: for all intervals, return a timestamp and 0 or 1 depending on whether or not an alarm is raised for alarm_name with metric_name
  • Return type: array, at each index is |interval,flag| where flag is 0 or 1

get_full_alarms_epoch(alarm_name,metric_name)

  • Description: for all intervals, return an epoch and 0 or 1 depending on whether or not an alarm is raised for alarm_name with metric_name
  • Return type: array, at each index is |epoch,flag| where flag is 0 or 1

get_alarms(alarm_name,metric_name)

  • Description: returns intervals where an alarm is raised for alarm_name with metric_name
  • Return type: array of timestamps

get_alarms_epoch(alarm_name,metric_name)

  • Description: returns epochs where an alarm is raised for alarm_name with metric_name
  • Return type: array of integers

Labeling Anomalies

get_anomaly_types()

  • Description: retrieves all of the current anomaly types as an array with type and description
  • Return type: array

create_anomaly_type(description)

  • Description: inserts a new anomaly type in to ANOMALIES with the description provided
  • Return type: integer which represents the unique type attribute associated with the newly inserted anomaly type

get_attack_flow_list()

  • Description: queries the user for a filename which includes a list of flows associated with an attack, then checks the file that it is in proper format: <interval>,<flow_id>
  • Return type: the filename as a string if the file passed the format check and has at least one flow, nil is returned if it fails

get_user_anomaly_type()

  • Description: displays ANOMALIES in table format and asks the user to select one, or -1 if they want to create a new type
  • Return type: integer on success, nil on failure

insert_labeled_flows(filename)

  • Description: inserts into LABELED_FLOWS from filename, format of the file must be: <label_id>,<interval>,<flow_id>
  • Return type: none

insert_label(anomaly_type, description)

  • Description: inserts into LABELES with the anomaly_type and provided description, the label_id is generated from a sequence number
  • Return type: the label_id assigned to the label as an integer

get_all_labels()

  • Description: retrieves all of the labels in the LABELES
  • Return type: array, each array index has an array with: the label_id, anomaly description, label description, interval count, and flow count

get_label(label_id)

  • Description: retrieves the specified label in the LABELES
  • Return type: array with the following indexes: has the label_id, anomaly description, label description, interval count, and flow count

print_labels()

  • Description: prints out in a human readable format information about a label. Note that it can be used with a single label, ie. print_labels(get_label(1))
  • Return type: none

get_all_labeled_flows()

  • Desription: gets all of the flows from the LABELED_FLOWS table, including the associated label with the flows
  • Return type: array, at each index is an array with values label_id, interval, and flow_id

get_labeled_flows(labels)

  • Description: specify a label and get all of the flows associated with the label
  • Return type: array, at each index is an array with values interval and flow_id

get_all_full_labeled_flows()

  • Description: gets all of the FULL flow level information from the FLOWS associated with all labels, essentially returning all of the attack flows for all labels
  • Return type: array, at each index is a label_id and a full record from FLOWS which is a flow associated with the label

get_full_labeled_flows(label)

  • Description: gets all of the FULL flow level information from the FLOWS associated with the given label, essentially returning all of the attack flows for the given label
  • Return type: array, at each index is a full record from FLOWS which is a flow associated with the label

search_labels_and(words)

  • Description: search through all of the label descriptions for a label that contains all of the words in the string words, unordered
  • Note: this is not case sensitive
  • Return type: array of label_ids as integers

search_labels_or(words)

  • Description: search through all of the label descriptions for a label that contains any of the words in the string words, unordered
  • Note: this is not case sensitive
  • Return type: array of label_ids as integers

Synthetic Attack Table

create_attack_table(table)

  • Description: creates the attack table and view which is a union of the attack table and the table parameter
  • Return type: none

attack_flows_timestamp(timestamp)

  • Description: change all of the intervals of the flows in the attack table to timestamp
  • Return type: none

clear_attack_table()

  • Description: removes all of the attack flows from all intervals
  • Return type: none

drop_attack_table()

  • Description: deletes the attack table and view
  • Return type: none

Synthetic Attacks

insert_ib_worm(interval, infected_hosts, subnet_start, subnet_end, scan_rate, scan_port)

  • Description: insert worm activity on scan_port in interval sourced from infected_hosts random Internet addresses to random intranet addresses between subnet_start and subnet_end where each infected scanning host scans at a rate of scan_rate in hosts per second
  • Return type: none

insert_ob_worm(interval, infected_hosts, subnet_start, subnet_end, scan_rate, scan_port)

  • Description: insert worm activity onscan_port in interval sourced from infected_hosts random intranet addresses between subnet_start and subnet_end to random Internet addresses where each infected scanning host scans at a rate of scan_rate in hosts per second
  • Return type: none

insert_ib_hscan(interval, start_victim, scan_rate, scan_port, scanner)

  • Description: insert an inbound horizontal scan in interval from scanner on scan_port at a rate of scan_rate in hosts per second, sequentially attacking destination host addresses starting at start_victim
  • Return type: none

insert_ob_hscan(interval, start_victim, scan_rate, scan_port, scanner)

  • Description: insert an outbound horizontal scan in interval from scanner on scan_port at a rate of scan_rate in hosts per second, sequentially attacking destination host addresses starting at start_victim
  • Return type: none

insert_ib_vscan(interval, start_victim, scan_rate, scanner)

  • Description: insert an inbound horizontal scan in interval from scanner on ports 0-65535 at a rate of scan_rate in hosts per second, sequentially attacking destination host addresses starting at start_victim
  • Return type: none

insert_ob_vscan(interval, start_victim, scan_rate, scanner)

  • Description: insert an outbound vertical scan in interval from scanner on ports 0-65535 at a rate of scan_rate in hosts per second, sequentially attacking destination host addresses starting at start_victim
  • Return type: none

insert_ib_flood(interval, num_attackers, attack_rate_low, attack_rate_high, victim)

  • Description: insert num_attackers inbound attack flows at an attack_rate between attack_rate_low and attack_rate_high in KB/s against victim into interval
  • Return type: none

insert_ob_flood(interval, num_attackers, subnet_start, subnet_end, attack_rate_low, attack_rate_high, victim)

  • Description: insert num_attackers inbound attack flows at an attack_rate between attack_rate_low and attack_rate_high in KB/s against victim into interval, where the attackers are randomly chosen between subnet_start and subnet_end
  • Return type: none