Tuesday 5 July 2011

Why Virtual Standard Switch (vSS) doesn't need Spanning Tree Protocol

Today I want to wirte down new things I have learnt recently about vNetwork Standard Switch in vSphere 4.1 and why it doesn't need Spanning Tree protocol.


I assume you already have basic knowledge about switching, vlans, switching loops, Spanning tree protocol and any type of  link aggregation protocols. I will go very quickly through main features of standard vSwitches focusing on facts that are not very obvious from official documentation, at least for me. Generally speaking, this article will be more useful for people that already has some experience with vSphere networking.


The main goal of standard vSwitch is to provide connectivity between your virtual machines and physical network infrastructure. Additionaly, it provides logical division of your VMs with PortGroups, offers different Load Balancing algorithms in case you have more that one uplink, supplies egress traffic shaping tool (from VMs to physical switches) and finally, provides Network Uplink failover detection.


How vSwtich works:


In contrast to physical switches vSwitch doesn't need to learn mac addresses of all connected devices. Since all VMs have their mac addresses assigned by ESX host this information is already known by vSwitch. 
Another thing that makes vSwitch different to physical switches is that uplink ports to physical switches and internal virtual ports to your VMs are treated differently and vSwtich applies different forwarding rules to them. It is worth to mention that vSwitch works only on Layer 2. It will never check IP header's content.




Port Groups and Vlans


Port groups define configuration policy (e.g. vlan tagging, taffic shaping, load balancing policy) for each virtual port. When you connect virtual machine to vSwitch you simply specify desired Port Group name for it. For instance, you could apply policy that makes several  virtual ports to use only specific uplink for traffic that goes from virtual machines to physical switches and remaining pNics should be used only if Active uplinks fails. Another good example is when you want to apply different security policies you would definitely use different Port Groups.  
Very often Port Groups are compared to Vlans in physical switches which is not correct comparison. I guess this happens because very often vmware admins use Port groups assign virtual ports to different vlans. However, they don't necessarily correspond 1-to1 to vlans. There are some  cases when you have more than one Port Groups assigned to the same vlan. 
Another thing that makes Port Groups and Vlans different is that in physical switch computers connected to different vlans cannot communicate directly. Traffic has to be sent to external L3 device which will route it. But in case your Port Groups have the same vlan assignment VMs can communicate directly without assistance of L3 devices. It is impossible to fit  all information about Port Groups into one short blog post, so I would recommend you to read  vSphere Networking documentation.



Uplinks - Load Balancing/ Network Failover Detection


You can have up to 32 physical uplinks on vSwitch.  Uplinks cannot be shared between vSwitches, that is, if you assigned specific uplink to one of the vSwitches, you can't assign it to any other vSwitch. This also means that traffic never goes directly between two vSwitches on the same host. It either has to go through uplink or through VM that can be connected to both vswitches and is capable to route traffic. 
In 99% of configurations you will have more than one uplink and you will need to decide how you want to load balance traffic over your uplinks and what network failover detection you want to implement. By default, these policies are applied on vSwitch level, but if you want to customize them for specific VMs you can change the settings on Port Group level to override vSwitch policy.
There are 3 main Load Balancing policies used with vSwitches:
1. Route based on the originating virtual switch port ID - say, you have 6 VMs connected to your vSwitch and it has 2 uplinks. Every time VM boots up vSwitch picks up one of the uplink for this VM in a round robin fashion. All traffic from this VM will constantly be sent through the same uplink. VM can be switched to another uplink only if its active uplink fails or VM was powered off and on again. Return traffic sent by physical switch to this VM will always be sent through the same uplink as physical switch has learnt VM's mac address on the uplink port.
2. Route based on source MAC hash - it is very similar to the first load balancing method. Instead of virtual port, vSwitch will use VM's mac address hash to pick up the uplink.
3. Route based on IP hash - here vSwitch uses hash of source and destination ip addresses when chosing uplink to physical switch. vSwitch will use hash of source and destination ip addresses to pick the uplink, that is, per each source-destination ip session only 1 uplink will be used. This is done to guarantee that order of the packets is not broken and the packets are delivered in the right consequence.  You will need your switch (or stack of switches) to support 802.3ad link aggregation standard. It works only in static mode with vSwithc, which means it doesn't support LACP  (no dynamic port aggregation). vSwitch and physical switch don't have to use the same hashing algorithm. Traffic can go from VM to swtich over one uplink and comes back over another uplink - it is only order of packets that matters. 
One of the common scenarios of Load Balahcing is very similar to Vlan Load Balancing between trunks using Spanning-Tree protocol on physical switches. In ESX you can have 2 Port Groups and 2 uplinks on vSwitch, for one of the Port Group the first uplink is Active, and the second one is in Standby mode, and the second Port Group have mirror configuration for its 2 uplinks. So you have more effective use of bandwidth and still provide redundancy for network connection of your VMs. 
Sometimes Vmware admins dedicate separate pair of uplinks for vmotion or FT traffic for isolation purpose. For me it seems like a waste of physical Nics due to the fact that your vmotion/FT traffic will be very rarely load balanced between this pair, and most of the time only one Nic will be in use.




QoS/Egress


With vSwitch you can implement QoS only on egress traffic, that means it will affect traffic outgoing from VMs or VMK interfaces to vSwitch, which is more than enough to shape your vMotion traffic. However, in my opinion, it is not really useful for normal production virtual machines.


Why doesn't vSwitch need support of Spanning Tree?


I remember I had so many questions about vSwitch when I just finished my first vSphere training and unfortunately our instructor was not expert in this area to fully explain the vSwitch logic about how it deals with Spanning Tree. So lets try to fix it. 
Here is the logic  that lets vSwtich to avoid using complicated and hard-to-troubleshoot Spanning Tree protocol:

  • STP BPDU packets sent by physical switch to vSwitch are simply discarded. 
  • Whatever the vSwitch receives on one of the uplinks it will never send it out through other uplinks. Traffic coming from physical uplinks can go only to virtual ports. That means no loops through vSwitches.
  • When vSwitch sends broadcast / multicast / unknown unicast traffic over one of the uplinks the physical switch will normally flood such type of traffic through all of its ports. Therefore, vSwitch has to check all source mac addresses of traffic received on uplinks. If the source mac address belongs to one of the virtual machine or VMK interface this traffic will be discarded.  That means no loops through physical switches.
  • Broadcast and multicast traffic sent by VM will be forwarded to all VMs in the same Port Group. The copy of this traffic will be also sent to physical switch using only one uplink that is chosen according to Load Balancing policy.
  • As vSwitch already knows about all VM's mac addresses it will discard any frame received on uplinks which is addressed to unknown mac address. If the frame is sent by VM to uknown mac address it will not be flooded to all VMs in the same Port Group as Physical Switch would do. Such frame will be sent over one of the uplinks, again, according to Load Balancing policy.
With all this said it becomes very obvious how we can have multiple links between vSwitch and physical swithc and still avoid using Spanning Tree. 

Most of this information I have learnt from amazing blog of Ivan Pepelnjak and the basics were learn from this Vmware document.

If you find this post useful please share it with any of the buttons below. 

4 comments:

  1. I know this is kind of an old post, but thanks. It cleared up some confusion for me.

    ReplyDelete
  2. Excellent! Again, I know it's an old post but very useful. This answered my exact questions about multiple connections to physical switches and my concerns about broadcast traffic. I couldn't find any reference to such in official documentation. Thank you.

    ReplyDelete
    Replies
    1. glad I could help. I myself read this post from time to time to refresh my knowledge on vSS :)

      Delete