= Datapository Methods = The following is a list of available methods in the toolkit for use in development of new scripts, grouped by functionality. Quick Links: * [wiki:DPUserFunctions#CoreMethods Core Methods] * [wiki:DPUserFunctions#TrafficStatistics Traffic Statistics] * [wiki:DPUserFunctions#Entropy Entropy] * [wiki:DPUserFunctions#AnomalyDetectionMATLAB Anomaly Detection] * [wiki:DPUserFunctions#LabelingAnomalies Labeling Anomalies] * [wiki:DPUserFunctions#SyntheticAttackTable Synthetic Attack Table] * [wiki:DPUserFunctions#SyntheticAttacks Synthetic Attacks] == Core Methods == ''get_metrics()'' * Description: returns all of the available metrics * Return type: array, at each index is an integer representation of the metric id, and the string name of the metric ''interval_to_filename(interval)'' * Return type: string formatted filename ''epoch_to_filename(epoch)'' * Return type: string formatted filename ''epoch_to_timestamp(epoch)'' * Return type: timestamp interval * NOTE: epoch must be in ''[wiki:EntityDictionary#INTERVAL_STATS INTERVAL_STATS]'' ''timestamp_to_epoch(timestamp)'' * Return type: integer epoch * NOTE: timestamp must be in ''[wiki:EntityDictionary#INTERVAL_STATS INTERVAL_STATS]'' ''get_all_intervals()'' * Return type: array, each element consisting of a string formatted interval and epoch * Ex. Usage: get_all_intervals().each {|interval,epoch| ...} ''get_intervals_check(type, low_score, high_score)'' * Description: get all intervals from ''[wiki:EntityDictionary#INTERVALS INTERVALS]'' where the attribute named ''type'' has a value between ''low_score'' and ''high_score'' * Return type: array, each element is a timestamp formatted interval ''get_random_intervals(num_intervals)'' * Description: get ''num_intervals'' random intervals from ''[wiki:EntityDictionary#INTERVALS INTERVALS]'' * Return type: array, each element is a timestamp formatted interval ''get_random_intervals_check(type, low_score, high_score, num_intervals)'' * Description: get ''num_intervals'' random intervals from ''[wiki:EntityDictionary#INTERVALS INTERVALS]'' where the attribute named ''type'' has a value between ''low_score'' and ''high_score'' * Return type: array, each element is a timestamp formatted interval ''get_flows_partitions()'' * Return type: an array consisting of all of the ''[wiki:PartitioningFlowsTable#PartitioningtheFlowsTable FLOWS table partitions]'' * Ex. Usage: get_all_intervals().each {|interval,epoch| ...} ''get_editor_input()'' * Description: open up the user's favorite $EDITOR and get their input * Return type: the user's input in string format, nil on error ''correlate( filename1, filename2, label1, label2, outfile )'' * Description: generates correlation scores between the second columns of data in ''filename1'' and ''filename2'' * Output file: ''outfile'' * Output format: == Traffic Statistics == ''stats_addr_degree_in(timestamp,table)'' * Return type: an array result indexed by ''host address'' with values being the associated ''in degree'' ''stats_addr_degree_out(timestamp,table)'' * Return type: an array result indexed by ''host address'' with values being the associated ''out degree'' ''stats_degree_in(timestamp,table)'' * Return type: an array result indexed by ''in degree'' with values being the ''number of hosts'' with the indexed degree ''stats_degree_out(timestamp,table) * Return type: an array result indexed by ''out degree'' with values being the ''number of hosts'' with the indexed degree ''stats_addr_src(timestamp,table)'' * Return type: an array result indexed by ''host address'' with values being the associated ''source packets'' ''stats_addr_dst(timestamp,table)'' * Return type: an array result indexed by ''host address'' with values being the associated ''destination packets'' ''stats_ports_src(timestamp,table)'' * Return type: an array result indexed by ''port'' with values being the associated ''source packets'' ''stats_ports_dst(timestamp,table)'' * Return type: an array result indexed by ''port'' with values being the associated ''destination packets'' ''stats_fsd(timestamp,table)'' * Return type: an array result indexed by ''flow size distribution'' with values being the ''number of flows'' with the given flow size distribution == Entropy == ''compute_all_entropy(interval, table)'' * Description: compute all of the entropy values for all [wiki:TrafficFeatures traffic features] * Return type: hash indexed by [wiki:TrafficFeatures traffic feature], value at the index is the entropy as a float ''cache_all_entropy(interval,entropy)'' * Description: takes an interval and a hash indexed by traffic feature (like what ''compute_all_entropy()'' returns) and caches the values in ''[wiki:EntityDictionary#INTERVAL_STATS INTERVAL_STATS]'' * Return type: none ''read_all_entropy(interval)'' * Description: reads the cached entropy values for all [wiki:TrafficFeatures traffic features] from ''[wiki:EntityDictionary#INTERVAL_STATS INTERVAL_STATS]'' * Return type: hash indexed by [wiki:TrafficFeatures traffic feature], value at the index is the entropy as a float ''degree_entropy(data)'' * Description: takes an array of degree data returned from ''stats_addr_degree_in()'' or ''stats_addr_degree_out()'' and computes the entropy * Return type: entropy as a float ''general_entropy(data)'' * Description: takes an array of data return from ''stats_ports_*()'', ''stats_addr_*()'' and ''stats_fsd()'' and computes the entropy * Return type: entropy as a float ''entropy_degree_in(interval,table)'' * Description: computes the entropy for ''interval'' in ''table'' by calling ''stats_addr_degree_in()'' and passing the data to ''degree_entropy(data)'' * Return type: entropy as a float ''entropy_degree_out(interval,table)'' * Description: computes the entropy for ''interval'' in ''table'' by calling ''stats_addr_degree_out()'' and passing the data to ''degree_entropy(data)'' * Return type: entropy as a float ''entropy_ports_src(interval,table)'' * Description: computes the entropy for ''interval'' in ''table'' by calling ''stats_ports_src()'' and passing the data to ''general_entropy(data)'' * Return type: entropy as a float ''entropy_ports_dst(interval,table)'' * Description: computes the entropy for ''interval'' in ''table'' by calling ''stats_ports_dst()'' and passing the data to ''general_entropy(data)'' * Return type: entropy as a float ''entropy_addr_src(interval,table)'' * Description: computes the entropy for ''interval'' in ''table'' by calling ''stats_addr_src()'' and passing the data to ''general_entropy(data)'' * Return type: entropy as a float ''entropy_addr_dst(interval,table)'' * Description: computes the entropy for ''interval'' in ''table'' by calling ''stats_addr_dst()'' and passing the data to ''general_entropy(data)'' * Return type: entropy as a float ''entropy_fsd(interval,table)'' * Description: computes the entropy for ''interval'' in ''table'' by calling ''stats_fsd()'' and passing the data to ''general_entropy(data)'' * Return type: entropy as a float ''print_magnitude_entropy(magnitude, entropy)'' * Description: takes a magnitude value and entropy hash to be printed to stdout for use with synthetic attacks * Output format: * Return type: none == Anomaly Detection (MATLAB) == ''dev_scores(filename, outfile)'' * Description: generates standard deviation scores for the second column of data in ''filename'' * Output file: ''outfile'' * Output format: ''dev_scores_cleaning(filename,outfile)'' * Description: generates standard deviation scores for the second column of data in ''filename'' after performing an iterative cleaning process described in the [wiki:SDEVDetection#GeneratingStandardDeviationScores standard deviation] section * Output file: ''outfile'' * Output format: ''wavelet_anomaly_detection(filename,outfile)'' * Description: generates wavelet based deviation scores with a window of 30 minutes from the second column of data in ''filename'' * Output file ''outfile'' * Output format: ''get_full_alarms(alarm_name,metric_name)'' * Description: for all intervals, return a timestamp and 0 or 1 depending on whether or not an alarm is raised for ''alarm_name'' with ''metric_name'' * Return type: array, at each index is |interval,flag| where flag is 0 or 1 ''get_full_alarms_epoch(alarm_name,metric_name)'' * Description: for all intervals, return an epoch and 0 or 1 depending on whether or not an alarm is raised for ''alarm_name'' with ''metric_name'' * Return type: array, at each index is |epoch,flag| where flag is 0 or 1 ''get_alarms(alarm_name,metric_name)'' * Description: returns intervals where an alarm is raised for ''alarm_name'' with ''metric_name'' * Return type: array of timestamps ''get_alarms_epoch(alarm_name,metric_name)'' * Description: returns epochs where an alarm is raised for ''alarm_name'' with ''metric_name'' * Return type: array of integers == Labeling Anomalies == ''get_anomaly_types()'' * Description: retrieves all of the current anomaly types as an array with ''type'' and ''description'' * Return type: array ''create_anomaly_type(description)'' * Description: inserts a new anomaly type in to ''[wiki:EntityDictionary#ANOMALIES ANOMALIES]'' with the ''description'' provided * Return type: integer which represents the unique ''type'' attribute associated with the newly inserted anomaly type ''get_attack_flow_list()'' * Description: queries the user for a filename which includes a list of flows associated with an attack, then checks the file that it is in proper format: , * Return type: the filename as a string if the file passed the format check and has at least one flow, ''nil'' is returned if it fails ''get_user_anomaly_type()'' * Description: displays ''[wiki:EntityDictionary#ANOMALIES ANOMALIES]'' in table format and asks the user to select one, or -1 if they want to create a new type * Return type: integer on success, nil on failure ''insert_labeled_flows(filename)'' * Description: inserts into ''[wiki:EntityDictionary#LABELED_FLOWS LABELED_FLOWS]'' from ''filename'', format of the file must be: '',,'' * Return type: none ''insert_label(anomaly_type, description)'' * Description: inserts into ''[wiki:EntityDictionary#LABELES LABELES]'' with the ''anomaly_type'' and provided ''description'', the ''label_id'' is generated from a sequence number * Return type: the ''label_id'' assigned to the label as an integer ''get_all_labels()'' * Description: retrieves all of the labels in the ''[wiki:EntityDictionary#LABELES LABELES]'' * Return type: array, each array index has an array with: the label_id, anomaly description, label description, interval count, and flow count ''get_label(label_id)'' * Description: retrieves the specified label in the ''[wiki:EntityDictionary#LABELES LABELES]'' * Return type: array with the following indexes: has the label_id, anomaly description, label description, interval count, and flow count ''print_labels()'' * Description: prints out in a human readable format information about a label. Note that it can be used with a single label, ie. print_labels(get_label(1)) * Return type: none ''get_all_labeled_flows()'' * Desription: gets all of the flows from the ''[wiki:EntityDictionary#LABELED_FLOWS LABELED_FLOWS]'' table, including the associated label with the flows * Return type: array, at each index is an array with values ''label_id'', ''interval'', and ''flow_id'' ''get_labeled_flows(labels)'' * Description: specify a label and get all of the flows associated with the label * Return type: array, at each index is an array with values ''interval'' and ''flow_id'' ''get_all_full_labeled_flows()'' * Description: gets all of the FULL flow level information from the ''[wiki:PartitioningFlowsTable#FLOWS FLOWS]'' associated with ''all ''labels, essentially returning all of the attack flows for ''all'' labels * Return type: array, at each index is a label_id and a full record from ''[wiki:PartitioningFlowsTable#FLOWS FLOWS]'' which is a flow associated with the label ''get_full_labeled_flows(label)'' * Description: gets all of the FULL flow level information from the ''[wiki:PartitioningFlowsTable#FLOWS FLOWS]'' associated with the given label, essentially returning all of the attack flows for the given label * Return type: array, at each index is a full record from ''[wiki:PartitioningFlowsTable#FLOWS FLOWS]'' which is a flow associated with the label ''search_labels_and(words)'' * Description: search through all of the label descriptions for a label that contains ''all'' of the words in the string ''words'', unordered * Note: this is __not__ case sensitive * Return type: array of label_ids as integers ''search_labels_or(words)'' * Description: search through all of the label descriptions for a label that contains ''any'' of the words in the string ''words'', unordered * Note: this is __not__ case sensitive * Return type: array of label_ids as integers == Synthetic Attack Table == ''create_attack_table(table)'' * Description: creates the attack table and view which is a union of the attack table and the ''table'' parameter * Return type: none ''attack_flows_timestamp(timestamp)'' * Description: change all of the intervals of the flows in the attack table to ''timestamp'' * Return type: none ''clear_attack_table()'' * Description: removes all of the attack flows from all intervals * Return type: none ''drop_attack_table()'' * Description: deletes the attack table and view * Return type: none == Synthetic Attacks == ''insert_ib_worm(interval, infected_hosts, subnet_start, subnet_end, scan_rate, scan_port)'' * Description: insert worm activity on ''scan_port'' in ''interval'' sourced from ''infected_hosts'' random Internet addresses to random intranet addresses between ''subnet_start'' and ''subnet_end'' where each infected scanning host scans at a rate of ''scan_rate'' in hosts per second * Return type: none ''insert_ob_worm(interval, infected_hosts, subnet_start, subnet_end, scan_rate, scan_port)'' * Description: insert worm activity on''scan_port'' in ''interval'' sourced from ''infected_hosts'' random intranet addresses between ''subnet_start'' and ''subnet_end'' to random Internet addresses where each infected scanning host scans at a rate of ''scan_rate'' in hosts per second * Return type: none ''insert_ib_hscan(interval, start_victim, scan_rate, scan_port, scanner)'' * Description: insert an inbound horizontal scan in ''interval'' from ''scanner'' on ''scan_port'' at a rate of ''scan_rate'' in hosts per second, sequentially attacking destination host addresses starting at ''start_victim'' * Return type: none ''insert_ob_hscan(interval, start_victim, scan_rate, scan_port, scanner)'' * Description: insert an outbound horizontal scan in ''interval'' from ''scanner'' on ''scan_port'' at a rate of ''scan_rate'' in hosts per second, sequentially attacking destination host addresses starting at ''start_victim'' * Return type: none ''insert_ib_vscan(interval, start_victim, scan_rate, scanner)'' * Description: insert an inbound horizontal scan in ''interval'' from ''scanner'' on ports 0-65535 at a rate of ''scan_rate'' in hosts per second, sequentially attacking destination host addresses starting at ''start_victim'' * Return type: none ''insert_ob_vscan(interval, start_victim, scan_rate, scanner)'' * Description: insert an outbound vertical scan in ''interval'' from ''scanner'' on ports 0-65535 at a rate of ''scan_rate'' in hosts per second, sequentially attacking destination host addresses starting at ''start_victim'' * Return type: none ''insert_ib_flood(interval, num_attackers, attack_rate_low, attack_rate_high, victim)'' * Description: insert ''num_attackers'' inbound attack flows at an attack_rate between ''attack_rate_low'' and ''attack_rate_high'' in KB/s against ''victim'' into ''interval'' * Return type: none ''insert_ob_flood(interval, num_attackers, subnet_start, subnet_end, attack_rate_low, attack_rate_high, victim)'' * Description: insert ''num_attackers'' inbound attack flows at an attack_rate between ''attack_rate_low'' and ''attack_rate_high'' in KB/s against ''victim'' into ''interval'', where the attackers are randomly chosen between ''subnet_start'' and ''subnet_end'' * Return type: none