Monday, March 9, 2015

Enterprise QOS Design & Deployment - Good, Bad Or Ugly ? - The Business Side (Case Study)


For last couple of years I have been part of couple of Enterprise Level QOS Deployments. While some of them were tactical and others were completely strategic.

Now QOS is one of those topics in Network Industry which are considered to be highly complex and misunderstood at the same time. The part of the equation is there are lot of moving pieces that must fit together correctly in order to deploy QOS successfully in an Enterprise Environment.

At times I have seen Engineers just copy and paste QOS configurations from some other Enterprise they had access to in past and hoping that would solve the purpose. While in other cases people design QOS policies ensuring top priority for voice and video traffic in an Enterprise Network.

Now many Network Engineers while deploying QOS policies think that they should give highest priority to Voice and Video traffic in their Network. Now one of the common problem here is " Assumptions ".

As a Network Engineer you should always first observe the current state of Network Architecture, Hardware In Use, Documenting List of Critical Business Applications and Understanding their deployment along with Network requirements such as Transport (TCP vs UDP) (One to One Flow, One to Many Flow or Many to Many Flow) (Latency requirements) (Direct vs Redirected Access) (Realtime Vs Interactive) (Transport - LAN vs WAN, Wired vs Wireless) (SP Dependencies, SLA & Service Agreements) (MPLS Layer 2 vs Layer 3 VPNs) etc

The problem usually is that as Network Engineer we always try to understand and solve problems using technologies and tools. Now understanding technology and tools is definitely an important piece here but at the same time you should be able to convert a given business requirement into technical design and solution.

For example most books written around QOS would tell you to mark Voice with Highest Priority like EF and Video probably with AF41. While Voice and Video both are quite sensitive as traffic in a given network but does that also mean those are the most critical ones from Business Standpoint ?. Well if you think carefully that might not be the case in reality. For example an Enterprise client I have been recently working for was into News Paper Business. Now the Editorial Application they use to make news paper had a very tight Latency requirements end to end which was 40msec or less. While If we compare this with voice traffic latency guidelines which is 150msec or less we can certainly find the given requirements are very tight. At the same time this brings an interesting question from design standpoint which is " Shall we still give EF marking to Voice or Shall we use EF marking for Editorial Application ? ". Now if we think from business standpoint or talk to business to figure it out - The answer most likely is going to be that Editorial Application shall be given most priority. Now to make situation a little more complex they had an old Editorial application which was still in use while they were moving to new Editorial Application , so in that sense we now have two Editorial Applications with high latency sensitive requirements :). So we must take of these considerations while designing QOS policy.

Now would QOS policy alone solve the purpose now ?. Well as I mentioned the end to end latency requirements for given Editorial Applications were 40mses or less, which means you need to take a look at WAN Architecture of customer and see how this requirement can be met. Now the company had One Hub Locations in each region of India with approximately 50 remotes sites connecting to each hub site. All Hub locations were connected in partial mesh fashion.


Now the customer WAN comprises 2 MPLS L3 VPN service providers and tons of Point to Point WAN CKTs. Now at high level all looks good. At max we need to ensure that our MPLS VPN service providers are in agreement to accept our QOS markings and give our traffic proper treatment across SP core.

Now the twist here is that while most of remote locations had Cisco ISR G1 or G2 routers, the Core locations were using Cisco Metro Ethernet Switches as MPLS CE device. Now interestingly most Metro Ethernet switches don't support Layer 3 QOS for the purpose of Bandwidth based reservations under LLQ but only L2 QOS and MPLS QOS. So again from the traditional QOS deployment standpoint it could very well be a major pushback.

So as you can see the couple of technical reasons and less understanding or communication with Business makes QOS a complex and misunderstood topic.

The other problem I have seen in field is while Network Engineers try to make policies for things such as Bandwidth reservations under queuing , they don't do traffic pattern analysis properly to figure out correct bandwidth requirements. Ideally you should deploy tools such as NetFlow and let it run for couple of weeks and later analyse it to reach on conclusions for bandwidth usage vs reservation requirements.

So as you can see from this brief post on Non Technical Side of QOS, there are perhaps too many pieces involved to make a QOS deployment successful. While most people say that WAN is having highest potential in terms of SDN use, QOS is probably another key area where SDN and Automation have huge scope IMHO  [ Ever tried to deploy LAN QOS with couple of 6500s, 4500s, 3750, 2950, Nexus in a single setup ? :) ]

HTH...
Deepak Arora
Evil CCIE