Reduction of Broadcast Traffic using SDN in Data Centers
Looking at the project title, a number of questions come into mind -
What is broadcast traffic?
What is SDN?
What is its relevance in Data Centers?
How does it involve a gain in performance? "Reduction"
Probing a little more would lead to questions like -
What is the topology used in Data Centers?
What are the typical requirements of a Data Center?
How can we develop and test Software Defined Networks?
Why is broadcast traffic a bottleneck in Data Centers? (and hence needs to be reduced for performance)
This blog would answer some of these questions and perhaps create new ones.
Lets first talk about network switches :
A switch is a device used on a computer network to physically connect devices together. It operates at the Link Layer which resides at Layer 2 of the OSI Model [Wiki: Link layer]. Switches uses MAC, which is a flat addressing scheme, for addressing and identifying connected devices. They are self learning and use mac tables (mapping of the MAC to the interface it belongs), which are populated as the packets flow through the switch, to perform its job.
The switch's job can be split into two parts :
1. Learning(Control Plane) : Adding entries into the mac tables.
The switch looks at the incoming packet's source mac address and the input port it came from. It then creates an entry which maps the two, in the mac table if its not already present.
2. Forwarding(Data plane) : Moving packet from input port to output port
This is done based on the packet's destination mac address. The destination mac address is looked up in the mac table and if matched, the packet is sent out of the corresponding port, obtained from the matched mac table entry.
If the look up fails, the switch sends the packet out of all the ports(broadcast) except the port in which it came from.
One thing to note is that each entry in the mac table has an associated timeout after which it will be removed.
This is done mainly to curb the size of the mac table.
Take away : Layer 2 switches generate a lot of broadcast traffic when forwarding; Either due to addition of a new host or due to the expiry of mac table entries.
Software Defined Networking - Conventionally, layer 2 switches have been self learning, where the control (logic) and the forwarding action (to the appropriate output port) are built into the switch itself. So any kind of tweaking with the control logic (to affect the forwarding action) would involve building custom switches, this can involve a lot of cost overhead for organisations that would want to perform switching based on some custom logic. This is where SDN comes to the rescue! The basic principle of Software Defined Networks is to separate the control and the data plane in the switches. The control plane (the brain of the switch) is moved out to an external component, the controller. The controller can then be controlled programmatically to change the behaviour of the switches.
Note : SDN is applicable to any forwarding device such as a hub, switch, etc. Controller defines the logic and the devices forward packets based on the defined logic.
Data Centers as we know, are huge collections of physical servers that host several applications. There are tens of thousands of hosts(which can be physical or virtual - vm) involved, and all of them are organised in some complex hierarchy. There is an enormous amount of packet traffic within data centers, and a significant part of it is broadcast traffic. This is one of the main reasons why huge Layer 2 networks cannot be used in DC even though they offer several benefits like plug and play nature, vm migration etc. They just cannot scale - there is an explosion of broadcast traffic as size of network increases. Our work involves reducing this broadcast by building applications over SDN.