Flowcalc

Export traffic to WEKA: the flowcalc toolkit

flowcalc is a software toolkit for calculating IP flow statistics out of raw traffic trace files. Thanks to the libtrace library, flowcalc can read numerous file formatsflowcalc has modular architecture based on libflowcalc, which makes it highly extensible and customizable. Each module is responsible for calculating different kind of flow features.

When flowcalc processes a traffic trace, each module receives a copy of every packet -- it can use a packet processing API to easily access packet data and thus compute desired flow statistics. When packet flows end, another function is called that prints the flow features. flowcalc uses the ARFF output format, which is readable by the popular WEKA and RapidMiner data-mining environments.

flowcalc supports real-time operation by limiting IP flows by the number of packets received so far, or by the time since the flow started. For example, it can generate flow features for just the first 10 packets or seconds of traffic in every flow. Apart of that, it supports the typical flow timeouts for UDP and TCP.

Several modules are already available:
  • stats: basic statistics of packet lengths and inter-arrival times
  • counters: counters of packets and bytes in flows
  • pktsize: records packet size of the first 5 packets
  • payload2: exports packet payloads (for DPI analysis)
  • dns: extracts flow domain names using DNS traffic context (for DNS-Class)
  • coral: port-based classification using CAIDA CoralReef (for ground-truth)
  • lpi: DPI inspection using libprotoident (for ground-truth)
  • ndpi: DPI inspection using libndpi (for ground-truth)
  • web: generic TCP transaction analysis tailored at web application traffic
  • websize: basic analysis tailored at SPDY/H2 encrypted web traffic flows
...but users are encouraged to create their own modules. For basic reference, see the README file and an exemplary moduleSee also the related ARFF tools page for help in manipulating flowcalc output files.

Downloads

Citing flowcalc

If you use flowcalc in your research, please cite the following work:

Foremski P., Callegari C., Pagano M., "Waterfall: Rapid identification of IP flows using cascade classification"Communications in Computer and Information Science. Proceedings of the 21st International Conference on Computer Networks, CN2014, Springer-Verlag, 2014

Exemplary output

%% flowcalc run at Wed Dec 5 13:34:56 2012
% modules: lpi ndpi


@relation '/trace/abc/trace-2012.05.26-17:40:39.pcap.gz'



%% flowcalc 0.1
% fc_id: flow id
% fc_tstamp: timestamp of first packet in the flow
% fc_duration: flow duration
% fc_proto: transport protocol
% fc_src_addr: IP address of connection initiator
% fc_src_port: TP port number of connection initiator
% fc_dst_addr: IP address of remote peer
% fc_dst_port: TP port number of remote peer
@attribute fc_id numeric
@attribute fc_tstamp numeric
@attribute fc_duration numeric
@attribute fc_proto {TCP,UDP}
@attribute fc_src_addr string
@attribute fc_src_port numeric
@attribute fc_dst_addr string
@attribute fc_dst_port numeric


%% lpi 0.1 - libprotoident
@attribute lpi_category string
@attribute lpi_proto string


%% ndpi 0.1 - nDPI
@attribute ndpi_proto string


@data
2,1338046839.437297,0.000085,UDP,212.14.174.234,9173,66.220.9.122,123,Services,NTP,ntp
5,1338046839.583401,0.000081,UDP,212.14.174.172,2154,131.107.1.10,123,Services,NTP,ntp
9,1338046839.724457,0.000079,UDP,212.14.174.233,2087,66.243.43.2,123,Services,NTP,ntp
16,1338046839.942828,0.000081,UDP,212.14.174.225,2272,192.36.144.22,123,Services,NTP,ntp
18,1338046840.011993,0.000653,UDP,212.14.174.102,2102,199.165.76.11,123,Services,NTP,ntp
26,1338046840.159199,0.001141,UDP,212.14.174.10,2176,129.7.1.66,123,Services,NTP,ntp
28,1338046840.184458,0.001139,UDP,212.14.174.172,2153,131.107.1.10,123,Services,NTP,ntp
22,1338046840.058255,0.139079,UDP,212.14.174.41,51994,1.1.1.1,53,Services,DNS,dns
36,1338046840.439666,0.000081,UDP,212.14.174.234,9173,209.51.161.238,123,Services,NTP,ntp
43,1338046840.675525,0.001146,UDP,89.224.252.57,53334,212.14.174.9,10115,Unknown,Unknown_UDP,ukn
54,1338046840.999430,0.001158,UDP,212.14.174.202,55735,1.1.1.1,53,Services,DNS,dns
66,1338046841.438561,0.000091,UDP,212.14.174.234,9173,216.218.254.202,123,Services,NTP,ntp
74,1338046841.653618,0.000093,UDP,212.14.174.105,65476,1.1.1.1,53,Services,DNS,dns
75,1338046841.660276,0.001141,UDP,212.14.174.105,65475,61.67.210.241,123,Services,NTP,ntp
80,1338046841.942965,0.001142,UDP,212.14.174.225,2272,129.7.1.66,123,Services,NTP,ntp
81,1338046842.002728,0.000082,UDP,212.14.174.102,2102,140.142.16.34,123,Services,NTP,ntp
1460,1338046955.430505,6.679060,TCP,91.94.249.172,1085,212.14.173.71,1610,No_Payload,No_Payload,ukn
84,1338046842.159374,0.001142,UDP,212.14.174.10,2176,192.43.244.18,123,Services,NTP,ntp
85,1338046842.184643,0.000078,UDP,212.14.174.172,2153,199.165.76.11,123,Services,NTP,ntp
90,1338046842.439682,0.000077,UDP,212.14.174.234,9173,209.81.9.7,123,Services,NTP,ntp
97,1338046842.913763,0.000211,UDP,154.20.231.103,12420,212.14.174.96,19170,P2P_Structure,BitTorrent_UDP,bittorrent
103,1338046843.394643,0.000090,UDP,61.57.112.210,20392,212.14.174.75,12618,P2P_Structure,BitTorrent_UDP,bittorrent
...