Correlation

When comparing traffic metrics or anomaly detection methods, computing correlation coefficient scores between two entities can be useful for understanding their similarities and differences. Correlation of entropy data can give a better understanding of how similar two metrics are in the raw inferences they can make from the network. Correlation of alarms across different traffic metrics with a single anomaly detection method can give a high level view of what metrics can generate similar sets of alarms. Correlating alarm data using two different anomaly detection methods with a single metric can also be done to understand the similarities and differences of different methods.

Generating Correlation Scores

Generating correlation scores can easily be done using the correlation function in MATLAB. However we provide several scripts to aid in computing correlation scores across numerous metrics, methods, or any other data generated by Datapository.

correlate.m

  • Usage: correlate <data_file_1> <data_file_2> <label_1> <label_2> <out_file>
  • Description: correlate the second column in two data files and append their correlation to the specified outfile
  • Example usage:
    >> correlate entropy-degree_in entropy-degree_out degree_in degree_out correlation_scores
    
    ans =
    
        0.0552
    
    >> exit
    
    $ cat correlation_scores 
    degree_in degree_out 0.055236  
    

gen_correlations.rb

When trying to correlate numerous pieces of data, this can be a tedious effort. To simplify the process a ruby script is provided as a wrapper for running correlations in MATLAB. The output is formatted as a matrix to make it easily readable. To perform correlation of multiple data files easily, the gen_correlations.rb script can be used. Since the data typically represents a timeseries, the first column is the timestamp and the second is the data value. The data need not be entropy data, however for now the actual data being correlated must be the second column in the file to work with the script.

  • Usage: ./gen_correlations <file_with_data_list>
  • Description: reads in a list of data files from the command line argument which is in the format: <label> <path_to_data>, then correlates all of the data, and displays it as a latex table
  • Example usage:
    $ cat metrics_entropy 
    indegree ../../traffic_data/entropy/entropy-degree_in
    outdegree ../../traffic_data/entropy/entropy-degree_out
    addr_src ../../traffic_data/entropy/entropy-addr_src
    addr_dst ../../traffic_data/entropy/entropy-addr_dst
    ports_src ../../traffic_data/entropy/entropy-ports_src
    ports_dst ../../traffic_data/entropy/entropy-ports_dst
    fsd ../../traffic_data/entropy/entropy-fsd
    
    $ ./gen_correlations.rb metrics_entropy
    
    \hline
    	&outdegree   &addr_src   &addr_dst   &ports_src   &ports_dst   &    fsd   \\
    \hline
    indegree&   -1.0000&   -1.0000&   -1.0000&   -1.0000&   -1.0000&   -1.0000\\
    \hline
    outdegree&         -&   -1.0000&   -1.0000&   -1.0000&   -1.0000&   -1.0000\\
    \hline
    addr_src&          -&         -&   -1.0000&   -1.0000&   -1.0000&   -1.0000\\
    \hline
    addr_dst&          -&         -&         -&   -1.0000&   -1.0000&   -1.0000\\
    \hline
    ports_src&         -&         -&         -&         -&   -1.0000&   -1.0000\\
    \hline
    ports_dst&         -&         -&         -&         -&         -&   -1.0000\\
    \hline