SDNFlow Dataset

Citation Author(s):
Jorge
Buzzio-García
INICTEL-UNI
Jaime
Vergara
Universidad de Antioquia
Santiago
Rios-Guiral
Universidad de Antioquia
Christian
Garzón
Universidad de Antioquia
Sergio
Gutierrez
Universidad de Antioquia
Juan Felipe
Botero
Universidad de Antioquia
Jose Luis
Quiroz-Arroyo
INICTEL-UNI
Jesus Arturo
Perez-Diaz
Tecnologico de Monterrey
Submitted by:
jorge buzzio
Last updated:
Thu, 12/14/2023 - 09:15
DOI:
10.21227/40v2-hh58
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

In the contemporary cybersecurity landscape, robust attack detection mechanisms are important for organizations. However, the current state of research in Software-Defined Networking (SDN) suffers from a notable lack of recent SDN-OpenFlow-based datasets. This study seeks to bridge this gap by introducing a novel dataset for intrusion detection in Software-Defined Networking (SDN). The dataset, derived from OpenFlow statistics gathered from real traffic, integrates a comprehensive range of network activities.In the contemporary cybersecurity landscape, robust attack detection mechanisms are importanttivities. An empirical evaluation leveraging diverse Machine and deep Learning algorithms was performed. The dataset is valuable for evaluating intrusion detection systems withinSDN environments and deepening the understanding of traffic patterns in Software Defined Networks.

Instructions: 

the SDNFlow Dataset is distinct from others, as it has been generated exclusively from OpenFlow switch statistics, and subsequently processed by an SDN application. The dataset consists of 37 parameters; each record is obtained from the information periodically gatheres from the flows within an OpenFlow switch. The resulting dataset comprises 662,828 records, with 221,564 corresponding to normal records and 441,264 to attack records.

For a better understanding, the parameters are grouped in the following manner:

  • Flow identifier features: These parameters are associated with the match fields in a flow rule, allowing the identification of a specific flow. They serve as a basis for updating other parameters. Values such as flow_id, eth_type. ipv4_src, ipv4_dst, ip_proto, src_port, dst_port and flow_duration can be grouped here.
  • Packet-based features: This group of parameters contains statistics about the packets processed by the switch that match flow entries. It includes total packets per flow, packets per time unit, and the sample period for each flow.
  • Byte-based features: These parameters store information about the bytes of the matching-flow packets processed by the switch. Similar to packet-based features, this group includes total bytes per flow, bytes per flow per time unit, and the sample period for each flow.
  • Flow-timer features: Flow-timer features to track the uptime and downtime of each flow.
  • Cumulative features: These statistics store metrics associated with different flows with parameters in common in a given time. They include metrics like the total number of packets/bytes per source and destination IP and the total number of connections per source and destination IP. These statistics provide an overview of the network activity.