Changes from Version 1 of ReformatForDP

Show
Ignore:
Author:
trac (IP: 127.0.0.1)
Timestamp:
06/14/07 15:50:47 (3 years ago)
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ReformatForDP

    v0 v1  
     1= Reformatting Argus Flow Leve Data for Datapository = 
     2 
     3The following guide describes how to convert Argus flow level ''text'' data for insertion in to the [http://datapository.net Datapository] ''[wiki:EntityDictionary#FLOWS FLOWS]'' table. 
     4 
     5== Argus Flow Level Format == 
     6 
     7The original [http://www.qosient.com/argus/ Argus] flow level format is as follows: 
     8 
     9[[Image(http://cyprus.cmcl.cs.cmu.edu/projects/entropy_analysis/chrome/common/argus_format.png)]] 
     10 
     11Fields: 
     12  * ''StartT'': the start time in seconds of the flow 
     13  * ''FinT'': the finish time in seconds of the flow 
     14  * ''Left_IP_Port'': the left IP address and the port 
     15  * ''Flow_Dir'': a character representation of the flow direction 
     16  * ''Right_IP_Port'': the right IP address of the flow 
     17  * ''Src_P'': the source packet count 
     18  * ''Dst_P'': the destination packet count 
     19  * ''Src_B'': the source byte count 
     20  * ''State'': the final state of the connection when it was recorded 
     21 
     22Our [http://www.qosient.com/argus/ Argus] dataset has these flows aggregated in to five minute intervals, stored in compressed files such as core-full.2005.02.01.02.55.gz 
     23 
     24The format of the filenames are core-full.''year''.''month''.''day''.''hour''.''minute''.gz in which the timestamp embedded in the filename represents the start of the five minute interval that all of the flows it contains belongs to.  These files are stored in directories which also represent their interval aggregation:  Data/archive/2005/02/01/02/core-full.2005.02.01.02.55.gz, such that:  Data/archive/''year''/''month''/''day''/... 
     25 
     26== Parsing the Argus Flow Level Data == 
     27 
     28The chosen output of the [http://www.qosient.com/argus/ Argus] data has made parsing it to a more universal format slightly painful. 
     29 
     30Separating an IP address and port with a '.' is a bad idea.  This is especially when not all flow records have an associated port, such as ICMP data.  If you consider everything past the last '.' the port, you will improperly parse ICMP data for instance.  You must count the '.' and determine if a port exists, and then parse it appropriately.  This could have been simplified by splitting the IP address and port with a space, as everything else is. 
     31 
     32Recording the protocol in text format, which is not necessarily universal, is bad for a universal storage repository.  It is converted to the protocol number which is universal.  The ''State'' field has this same problem, but I am unaware of anything universal for it so it was kept in this format. 
     33 
     34Having a flow direction which must be parsed for each flow would also make queries painful in a database.  Directionality is not always determinable by the [http://www.qosient.com/argus/ Argus] auditing tool.  When directionality is unknown, a number of heuristics are performed to determine it and if it is still unknown it is marked in the database as unknown.  Otherwise, the flow is converted in to a format in which there is a source IP address and a destination IP address, not a left and right with a flow direction. 
     35 
     36== Reformatting Tool == 
     37 
     38In our [http://cyprus.cmcl.cs.cmu.edu/projects/entropy_analysis/browser/scripts code repository], there is a [http://cyprus.cmcl.cs.cmu.edu/projects/entropy_analysis/browser/scripts/c/reformat_argus_for_datapository conversion tool (dp_reformat)] which takes a data path full of compressed [http://www.qosient.com/argus/ Argus] flow level data files and converts them to [http://datapository.net Datapository] format for insertion. 
     39 
     40The tool will parse the [http://www.qosient.com/argus/ Argus] flow level data and ''scp'' it over to [http://datapository.net Datapository] where it can be inserted in to the database. 
     41 
     42To use the tool, you specify a path which contains the [http://www.qosient.com/argus/ Argus] flow level data and it will traverse it recursively and output the data in to a file such as core-full.2005.02.01.02.55.gz-dp. 
     43 
     44An example usage for converting all of the data from February of 2005, which also displays its current status, is: 
     45 {{{ 
     46   $ ./dp_reformat /mnt/campus-2005-1TB/Data/archive/2005/02 
     47   0 / 7726  |  /mnt/campus-2005-1TB/Data/archive/2005/02/01/00/core-full.2005.02.01.00.00.gz 
     48   1 / 7726  |  /mnt/campus-2005-1TB/Data/archive/2005/02/01/00/core-full.2005.02.01.00.05.gz 
     49   2 / 7726  |  /mnt/campus-2005-1TB/Data/archive/2005/02/01/00/core-full.2005.02.01.00.10.gz 
     50   3 / 7726  |  /mnt/campus-2005-1TB/Data/archive/2005/02/01/00/core-full.2005.02.01.00.15.gz 
     51 }}} 
     52 
     53Currently, the command line parameter is not implemented yet (it always parses february) and ''scp''ing the files to [http://datapository.net Datapository] is hard coded to my directory.  This should be changed for general use.