Entropy

Information entropy, introduced by Claude E. Shannon and shown in the equation below, measures the amount of disorder in a system where P(x) is the probability that X is in state x. More information about how entropy is computed for each of the metrics can be found in the writeup of our study.

http://cyprus.cmcl.cs.cmu.edu/projects/entropy_analysis/chrome/common/entropy_eq.png

Computing Entropy

We provide functions to compute the entropy on a specified interval and any traffic feature. From these functions, we also provide additional functions to compute and cache the entropy in INTERVAL_STATS, or display it in a readable form for the user. All entropy functions are defined in entropy.rb.

To compute the entropy for a given metric, all of the following ruby functions are available where the return type is a float:

  • entropy_degree_in(interval,table)
  • entropy_degree_out(interval,table)
  • entropy_ports_src(interval,table)
  • entropy_ports_dst(interval,table)
  • entropy_addr_src(interval,table)
  • entropy_addr_dst(interval,table)
  • entropy_fsd(interval,table)

The reason which the table must be specified is for our synthetic attack availability. Typical use should specify "flows" as a table.

If the entropy of all the metrics needs to be computed, a function is provided which runs each of the individual entropy functions and stores the return values in a hash. This can then be easily displayed in a readable manner:

   irb> compute_all_entropy("2005-02-01 00:00:00","flows").each {|metric,value| puts "  #{metric}\t#{value}"}
     fsd:           0.757657200409227
     ports_src:     0.583221513544996
     ports_dst:     0.589124978238745
     degree_in:     0.0816894163152613
     addr_src:      0.583086735489861
     addr_dst:      0.573817404410932
     degree_out:    0.277247791308184

Caching Entropy Values

Calculating entropy values is expensive, and in a large traffic study impossible to keep computing it for each interval and waiting for values. Therefore we provide functionality for the user to cache the entropy values and read from them in the INTERVAL_STATS table.

To cache the computed entropy, it is as simple as passing the hash returned from compute_all_entropy() to cache_all_entropy(). The get_all_entropy() function can then be used to read these cached values back from INTERVAL_STATS. Note that the table is no longer specified for caching, we only provide caching for the main FLOWS table.

   irb> entropy=compute_all_entropy("2005-02-01 00:00:00","flows")
   irb> cache_all_entropy("2005-02-01 00:00:00",entropy)
   irb> read_all_entropy("2005-02-01 00:00:00").each {|metric,value| puts "  #{metric}\t#{value}"}
     fsd:           0.757657200409227
     ports_src:     0.583221513544996
     ports_dst:     0.589124978238745
     degree_in:     0.0816894163152613
     addr_src:      0.583086735489861
     addr_dst:      0.573817404410932
     degree_out:    0.277247791308184

Entropy Scripts

Several useful wrappers are provided which can be used during insertions of flow records to do an initial computation of entropy values and caching them.

dp_update_entropy.rb

  • Usage: ./dp_update_entropy.rb <interval/epoch>
  • Description: accepts an interval as a timestamp or an epoch, computes the entropy, and then caches it. This can be used in conjunction with inserting flows to immediately compute and cache the entropy.

dp_dump_entropy.rb

  • Usage: ./dp_dump_entropy.rb
  • Description: for all traffic features, dump the cached entropy values in to metric files as a time series. This can be used for easily plotting entropy as a time series, and these output files are also used for computing standard deviation and wavelet scores.
  • Output Files: entropy-metric_name
  • Example usage:
    $ ./dp_dump_entropy.rb 
    2005-02-01 00:00:00 -- 1107234000
    2005-02-01 00:05:00 -- 1107234300
    ........(status output continues)
    
    $ ls -l entropy-*
    -rw-r--r-- 1 gnychis dp 139882 2007-03-17 21:09 entropy-addr_dst
    -rw-r--r-- 1 gnychis dp 139903 2007-03-17 21:09 entropy-addr_src
    -rw-r--r-- 1 gnychis dp 144825 2007-03-17 21:09 entropy-degree_in
    -rw-r--r-- 1 gnychis dp 139850 2007-03-17 21:09 entropy-degree_out
    -rw-r--r-- 1 gnychis dp 139896 2007-03-17 21:09 entropy-fsd
    -rw-r--r-- 1 gnychis dp 139896 2007-03-17 21:09 entropy-ports_dst
    -rw-r--r-- 1 gnychis dp 139875 2007-03-17 21:09 entropy-ports_src
    -rw-r--r-- 1 gnychis dp 140159 2007-03-17 21:09 entropy-volume
    
    $ head -n 3 entropy-addr_dst 
    1107234000 0.573814
    1107234300 0.595875
    1107234600 0.580456