| | 1 | = Reformatting Argus Flow Leve Data for Datapository = |
|---|
| | 2 | |
|---|
| | 3 | The following guide describes how to convert Argus flow level ''text'' data for insertion in to the [http://datapository.net Datapository] ''[wiki:EntityDictionary#FLOWS FLOWS]'' table. |
|---|
| | 4 | |
|---|
| | 5 | == Argus Flow Level Format == |
|---|
| | 6 | |
|---|
| | 7 | The original [http://www.qosient.com/argus/ Argus] flow level format is as follows: |
|---|
| | 8 | |
|---|
| | 9 | [[Image(http://cyprus.cmcl.cs.cmu.edu/projects/entropy_analysis/chrome/common/argus_format.png)]] |
|---|
| | 10 | |
|---|
| | 11 | Fields: |
|---|
| | 12 | * ''StartT'': the start time in seconds of the flow |
|---|
| | 13 | * ''FinT'': the finish time in seconds of the flow |
|---|
| | 14 | * ''Left_IP_Port'': the left IP address and the port |
|---|
| | 15 | * ''Flow_Dir'': a character representation of the flow direction |
|---|
| | 16 | * ''Right_IP_Port'': the right IP address of the flow |
|---|
| | 17 | * ''Src_P'': the source packet count |
|---|
| | 18 | * ''Dst_P'': the destination packet count |
|---|
| | 19 | * ''Src_B'': the source byte count |
|---|
| | 20 | * ''State'': the final state of the connection when it was recorded |
|---|
| | 21 | |
|---|
| | 22 | Our [http://www.qosient.com/argus/ Argus] dataset has these flows aggregated in to five minute intervals, stored in compressed files such as core-full.2005.02.01.02.55.gz |
|---|
| | 23 | |
|---|
| | 24 | The format of the filenames are core-full.''year''.''month''.''day''.''hour''.''minute''.gz in which the timestamp embedded in the filename represents the start of the five minute interval that all of the flows it contains belongs to. These files are stored in directories which also represent their interval aggregation: Data/archive/2005/02/01/02/core-full.2005.02.01.02.55.gz, such that: Data/archive/''year''/''month''/''day''/... |
|---|
| | 25 | |
|---|
| | 26 | == Parsing the Argus Flow Level Data == |
|---|
| | 27 | |
|---|
| | 28 | The chosen output of the [http://www.qosient.com/argus/ Argus] data has made parsing it to a more universal format slightly painful. |
|---|
| | 29 | |
|---|
| | 30 | Separating an IP address and port with a '.' is a bad idea. This is especially when not all flow records have an associated port, such as ICMP data. If you consider everything past the last '.' the port, you will improperly parse ICMP data for instance. You must count the '.' and determine if a port exists, and then parse it appropriately. This could have been simplified by splitting the IP address and port with a space, as everything else is. |
|---|
| | 31 | |
|---|
| | 32 | Recording the protocol in text format, which is not necessarily universal, is bad for a universal storage repository. It is converted to the protocol number which is universal. The ''State'' field has this same problem, but I am unaware of anything universal for it so it was kept in this format. |
|---|
| | 33 | |
|---|
| | 34 | Having a flow direction which must be parsed for each flow would also make queries painful in a database. Directionality is not always determinable by the [http://www.qosient.com/argus/ Argus] auditing tool. When directionality is unknown, a number of heuristics are performed to determine it and if it is still unknown it is marked in the database as unknown. Otherwise, the flow is converted in to a format in which there is a source IP address and a destination IP address, not a left and right with a flow direction. |
|---|
| | 35 | |
|---|
| | 36 | == Reformatting Tool == |
|---|
| | 37 | |
|---|
| | 38 | In our [http://cyprus.cmcl.cs.cmu.edu/projects/entropy_analysis/browser/scripts code repository], there is a [http://cyprus.cmcl.cs.cmu.edu/projects/entropy_analysis/browser/scripts/c/reformat_argus_for_datapository conversion tool (dp_reformat)] which takes a data path full of compressed [http://www.qosient.com/argus/ Argus] flow level data files and converts them to [http://datapository.net Datapository] format for insertion. |
|---|
| | 39 | |
|---|
| | 40 | The tool will parse the [http://www.qosient.com/argus/ Argus] flow level data and ''scp'' it over to [http://datapository.net Datapository] where it can be inserted in to the database. |
|---|
| | 41 | |
|---|
| | 42 | To use the tool, you specify a path which contains the [http://www.qosient.com/argus/ Argus] flow level data and it will traverse it recursively and output the data in to a file such as core-full.2005.02.01.02.55.gz-dp. |
|---|
| | 43 | |
|---|
| | 44 | An example usage for converting all of the data from February of 2005, which also displays its current status, is: |
|---|
| | 45 | {{{ |
|---|
| | 46 | $ ./dp_reformat /mnt/campus-2005-1TB/Data/archive/2005/02 |
|---|
| | 47 | 0 / 7726 | /mnt/campus-2005-1TB/Data/archive/2005/02/01/00/core-full.2005.02.01.00.00.gz |
|---|
| | 48 | 1 / 7726 | /mnt/campus-2005-1TB/Data/archive/2005/02/01/00/core-full.2005.02.01.00.05.gz |
|---|
| | 49 | 2 / 7726 | /mnt/campus-2005-1TB/Data/archive/2005/02/01/00/core-full.2005.02.01.00.10.gz |
|---|
| | 50 | 3 / 7726 | /mnt/campus-2005-1TB/Data/archive/2005/02/01/00/core-full.2005.02.01.00.15.gz |
|---|
| | 51 | }}} |
|---|
| | 52 | |
|---|
| | 53 | Currently, the command line parameter is not implemented yet (it always parses february) and ''scp''ing the files to [http://datapository.net Datapository] is hard coded to my directory. This should be changed for general use. |