Traffic Features
When performing an analysis on network traffic, there are several metrics that can be measured. The framework provides functionality for monitoring six prominent features, with the flexibility of adding additional metrics.
To introduce each metric, we will use the following example data which is in Datapository format:
| flow_id | interval | start_time | finish_time | src_ip | dst_ip | src_port | dst_port | src_packets | dst_packets |
| 0 | 2005-02-01 00:00:00 | 2005-02-01 00:00:01 | 2005-02-01 00:00:02 | 13000 | 524288 | 17430 | 80 | 20 | 140 |
| 1 | 2005-02-01 00:00:00 | 2005-02-01 00:00:01 | 2005-02-01 00:00:02 | 13000 | 524288 | 17430 | 80 | 20 | 140 |
| 2 | 2005-02-01 00:00:00 | 2005-02-01 00:00:01 | 2005-02-01 00:00:02 | 14000 | 524288 | 43209 | 80 | 25 | 120 |
| 3 | 2005-02-01 00:00:00 | 2005-02-01 00:00:01 | 2005-02-01 00:00:02 | 15000 | 824288 | 23412 | 445 | 74 | 135 |
Degree
Definition: The degree, also known as fan in/out, is the number of unique hosts that a given host is in contact with over a period of time. Degree is given as two metrics, degree in and degree out. The in degree for a given host is the number of unique hosts which contacted it. The out degree for a given host is the number of unique hosts that it contacted.
Example: Using our sample data, 524288 had an in degree of 2, since 13000 and 14000 contacted it. It does not have an in degree of 3, 13000 contacts it twice be only unique contacts are counted. Likewise, 524288 had an out degree of 0, typical of a server.
Flow Size Distribution
Definition: Flow size distribution can be read directionally or non-directionally. We use the non-directional flow perspective, which is typical of most traffic studies and the standard for Netflow data. To read a flow non-directionally, a given record in the database is split in to two records such that there is a flow from the source to the destination, and a flow from the destination to the source. The flow size distribution is kept in terms of packets.
Example: Take the first record and split it in to two flows, the two flows would be following, creating flow size distributions of 20 and 140:
| src_ip | dst_ip | src_port | dst_port | packets |
| 13000 | 524288 | 17430 | 80 | 20 |
| 524288 | 13000 | 80 | 17430 | 140 |
Taking the whole table, there were two flow sizes of 140, two flow sizes of 20, and single instances of the flow sizes 25, 74, 120, and 135.
Ports
Definition: The ports metric represents the number of packets that were associated with a given port. This metric is broken down in to a source and destination for each port. The ports source metric is the number of packets that were generated from the port. The ports destination metric is the number of packets that were sent to the given port.
Example: Taking the flow record with flow_id==0, port 80 sourced 140 packets and had 20 packets destined to it. Over the whole example flow set, port 80 sourced 400 packets and had 65 packets destined to it.
Addresses
Definition: Addresses keep track of the number of packets associated with a host. The metric is also broken up in to a source and destination which would be the number of packets sourced from the given host and the number of packets destined to the host.
Example: Address 13000 sourced 40 packets and was the destination of 280 packets.
Volume
Definition: The volume is an aggregate of the total traffic volume over a given period of time in terms of packets.
Example: The volume of interval '2005-02-01 00:00:00' was 674.
