Thursday, November 23, 2017

Few Questions You Should Ask Your fav Self-Driving Networks Vendor

Since everyone is talking about Intent Based or Data Driven Networks for a while, I spent some time recently going through work done by Cisco, Juniper, Apstra, Veriflow & Forward Networks in the Area of Intent Based Networking AKA Data Driven Network AKA Self Driving Networks.

So far seems like everyone is trying to solve the different sets of problems with some overlap. Also mostly they use ML/AI in some form or shape. But my assumption is Algorithms under ML/AI umbrella are proprietary (Secret Sauce) for most part with little details available publicly.

Also the common buzzword " Automation " would mean totally different set of things in such world IMHO. So In the mean while below are quick questions you can ask to your fav. OEM vendor guys next time you go for a product/training session around same :) to get more clarity.

> What they really mean by Intent Based Networking ? (Everyone has some different definition for this)

> How do they describe the Intent to the system ?

> How do they map Business Intent & Technology intent and describe it to system ?

> Does the system translate the intent into configurations/policies on it's own or they use some sort of policy language for admin to be used to describe Business and Technical intents to system ?

> What sort of checks that system has to verify described intent and returns feedback to admin ?

> Are there any dry run capabilities into the system ?

> How they maintain the consistency for the intent throughout the life-cycle - Plan, Design, Deploy, Verify, Operate & Optimize ?

> How Intent gets documented and updated over time ?

> How do you describe intent for things like Backup path etc. ?

> How do they create visualization for Intent to present to management & Network Architects ?

> How to modify Intent over the period and what would be the touch points ?

> How their Intent Based Networking is different from policy based network (Remember Promise Theory that ACI Works on) ?

> OEMs AI/ML/DL & Algorithm Details (I doubt they would be willing though) ?

> How do they control AI and ML since they can create their own algorithms and break this closed loop over time ? (Remember what happened with Facebook's AI attempt)

> How do they track changes in such networks such as created by AI/ML dynamically ?

> How does AI/ML understand if it's actual pattern change in network during attack or just traffic pattern change driven by an special event such as heavy load on billing systems during financial year end for example ?

> Do they use graph approach (By building Network Graph such as Link State Protocol Does) or they use relation approach (Such as describing links with properties ) across network components to describe intent ?

> Which Data model they follow to describe intent and push/change configurations ? (Also understanding Data Normalization Techniques & Data Model details will be key to understand support across multiple vendors )

> Do they support APIs to describe intent and to work with other systems as part of larger eco-system ?

> How do they deal with scale ? (Everyone has different sets of limitations here)

> How do they deal with mix of legacy networks & Cloud Networks in hybrid environment ?

> How do they handle Leaky abstraction & Grey failures ?

> How do they operate across multi OEM platforms with different HW/SW capabilities ? (Which essentially means they must form a eco system with selective set of partners as it's going to be an ongoing effort)

> Does the intent description part only takes care of configuration side of things or they even go further with Network Design as part of another abstraction layer ?

Let's park Telemetry related stuff of solution for a while which is another area to dig into in order to fit all pieces together.  :) (Maybe another follow up post)

Deepak Arora

Monday, December 5, 2016

Data Centre Fabric Design Considerations

Let's continue the SDN series.

Fabric based Data Centre architectures are becoming more and more common these days. While I mentioned in my earlier articles that some of the terminologies and other stuff that you hear under SDN umbrella are not technically new, it still gives us ability to solve some of Business and Design problems that were hard to solve in past probably for different reasons.

While some of large players like Microsoft, Facebook, Linkedin, Google and Amazon built their DC fabrics using Open Source ideas to meet their Hyper-scale Data Centre requirements, most mid and large Enterprise still seem to be little far going down that direction considering it's not an easy task to begin with. While people might be able to get things working to an extent, but in most cases the solution doesn't look very clean.

So as an alternate they have options like using someone else's brain child. For example Cisco ACI, VMware NSX, Juniper Contrail , Nokia Nuage are some of products that are designed and built to cater same set of requirements.

While there is plenty of material available on Web where people suggest different reasons to claim why one vendor solution is better than another. While I am sure some of things might be true considering someone with hands on exp. around these products might have encountered interesting issues along the way, I was recently asked by one friend to share my suggestions to figure out which solution they should pick over another while putting my love aside for a given vendor ;)

So here is my quick list, It's not very detailed but still give you some pointers to help with thought process:

- Network based overlay (ACI) vs. Host based overlay (understand pros and cons of each approach)

Would you need overlap between these two overlay models at some point , if so - would that require additional licensing, upgrades etc.

How availability domains will be supported in solution

What scale you would expect to hit down the line from control plane and data plane perspective 

What sort of Orchestration and Automation your customer require and how those needs map with particular solution capabilities

- How controller to controller communication happens, what are pre-requisites and do they allow overlay tunnels to go across different domains

- How particular OEM is going to avoid bug disasters. In theory if you are running same CODE, hitting any bug would potentially bring entire controller cluster down and defeat the purpose of running controllers in cluster or HA per say

- How open or close the API support is (Open doesn't mean completely open)

- How strong the partner ecosystem is in terms of integration and how well it ties with your Orchestrator

- How well your overlay model integrates with rest of network and what approaches they have to suggest as OEMs

- Do they allow to integrate the solution with other OEM solution down the line. For example Overlays like VxLAN don't specify the control plane. So while two OEMs may support VxLAN as possible common piece they might be taking different approaches to build control plane or might have some vendor specific twist added

- How well the solution works under multi hypervisor vendor environment

- Learning curve involved with new solution 

- Integration with current NMS deployment

- How your DCI connects and integrates between DC-DR or DC-DC (Active-Active)

- How well your fabric handles situations like customer using DR on cloud

- Support for containers (Corner case)

- How well solution is able to tie the virtual and physical workloads

- How fabric protects east west traffic (Micro-segmentation)

- How you move your virtual/physical workloads from OLD Architecture to New fabric architecture

Deepak Arora

Thursday, September 15, 2016

Redefining SDN

For quite some time I have been sort of irritated (for lack of better word) by hearing and discussing about buzz word in the market called SDN. Let's first review some of interesting facts:

- Some people seems to be afraid of loosing there jobs thinking they will be irrelevant in near future, well lots of posts are there on web where people are throwing opinions about whats the value of CCIE now and if CCIE is relevant any longer. (Mostly these are people claiming themselves as SDN/NFV experts or evangelist)


- SDN is the only thing that the next generation networks would be build around or run

- SDN & NFV are future of networking

- Protocols like Open Flow will take over the world soon and re-define the networking completely 

Now let's not try to define SDN here and keep it for next post since I would wan't to clarify on some of misconceptions.

So coming back to idea of there is fundamentally something new. Well it depends upon what's your personal take on SDN to begin with in terms of :

- What it is
- How it works
- What problems does it solve
- What are the new models, frameworks and protocols that works in the background (Check under the hood)

Now if you take a closer look, most of these products are good mix of:

- Overlay Networks (Host Based, Network Based or Hybrid - e.g.:  VxLAN etc.)

- Good Orchestrator that really works well for most part (Numerous failed  
   attempts by vendors in past) 

- Network virtualization techniques (e.g. VRFs,Virtual instances etc.)

- Some better protocols/techniques to support workload mobility (e.g. LISP etc)

- Fixes applied to some traditional techniques (e.g. Poor programmability  
   support with CLI in past or missing shell access etc.)

- Some old fundamentals re-discovered (Cap theorem , Game theory, Clos  
   Fabric, BGP Labelled Unicast etc.)

- Enhancements into existing protocols considering their proven history 
   and robustness ( BGP LS, BGP EVPN, Segment Routing etc.) to meet 
   new requirements in terms of scales and get rid of some old challenges

- Better API support

- Better abstraction 

Now to me these are old ideas which are wrapped in nice package (Remember RFC 1925 Rule 11) but certainly needed to meet business demands and scale in current scenario and near future depending upon what all business problems you are trying to solve with technology.

On the other hand important question is " Should I be afraid ? "

Well there are couple of moving pieces which you need to consider:

- Most traditional network certifications, classes & courses don't cover 
   these things which really makes the situation tough

- There are very limited books and texts around these topics and some are   
   even misleading

- Lack of new networking models to define and shape these protocols and 
   other stuff well and standardize them across vendors

- Most of companies in these segments present their products like real  
   game changers

- Are you really bad at adopting change

- Well in the end of the day it's all about money...isn't it ? :)

So while it's really good to have SDN tools around to solve different problems with technology and get rid of some challenges and limitations from past, we are still far from having T-800 from Judgement Day in real life :) (While another interesting question would be if we really want to go to that stage)


Further Readings:

Deepak Arora

Saturday, March 26, 2016

Clos Fabrics AKA Spine & Leaf Architecture

Let's start the series with discussion of CLOS Fabrics AKA Spine & Leaf Architectures.

Now CLOS design is not fundamentally new, but most of the Network Engineers were not talking about it till recent times (Well...this is true to an extent). So as Network Engineer should you really care ?

Well you should start by asking why CLOS in first place ?

The major problem that CLOS fabric solves is about solving scalability issues. While scalability is a matter of context, it's not necessary that everyone needs or to be precise going too far about it.

Also CLOS fabric also doesn't define your Layer2 - Layer 3 boundaries itself. So you are pretty much dependent upon what works best for you from vendor implementation perspective while keeping your overall goal in mind. Now in theory Layer 3 Fabrics scale much better than Layer 2 Fabric. Here are some questions/Things you figure out about CLOS if you decide to go for it :

- What is the scale that you got to deal with ?
- What are technical and business requirements ?
- Your DC traffic is mostly east-west or north-south ?
- How you can minimize the state of the Core (Spine) to minimum ?
- How flooding works in your fabric ?
- How multicast is handled in fabric ?
- Where to define Layer2-Layer 3 boundary ?
- Your network is going to multi vendor now/In future ?
- How you gonna manage and monitor such large network ?
- How you gonna introduce security & Services such as Load Balancer ?
- How you gonna connect to external world ? (Border Spine Vs. Border Leaf) 
- Define you convergence requirements 
- You gonna need single or multi stage CLOS ?
- Your over subscription ratio ? (Usually 3:1 is good for most part)
- Understand your failure domains and impact they may have
- Do you need Spine to Spine or Leaf to Leaf connections to mitigate some of     
   failure scenario ?
- If you are going with Layer 3 fabric, is it going to be good idea to use 
   summarization ?
- EBGP vs IBGP (Also RR placement) in Layer 3 fabric ?

Even as an example, Cisco's famous buzzword these days ACI (Application Centric Infrastructure ) also uses Spine & Leaf design. It uses BGP EVPN (Some secret souce but soon EVPN will be there too) control plane and on top of which it uses VXLAN as Data Plane. So between Spine & Leaf (Single Stage) it uses Layer 3 fabric. The entire fabric is managed with a centralized command and control system called Cisco APIC Controller. With ACI you can go as far as 6 Spines at the moment and all services (e.g. load balancer), firewalls, external connectivity gets terminated on Leaf switches. For server redundancy (Bare Metal Or Virtual ) it uses our old friend Virtual Port Channel (vPC) but this time doesn't require directly connected interfaces among leaf switches for peer link and peer keep alive link functions. 

Cisco ACI is kind of build around another buzz word that you hear more often these days called SDN (Software Defined Networks). Now whether it fits into true SDN definition or not needs another discussion :).

In the mean while below is the list of URLs which you may find very handy to get started with CLOS:

Deepak Arora

Monday, March 7, 2016

How To Be Network Ninja In 21st Century

I have been asked so many times by Network Engineers right from starter level to Expert level people about how network industry is changing at rapid pace in last few years and questions like if certification and in particular CCIE holds any value any longer. I also spoke with couple of friends that I truly admire and are working in US , Europe & Australia to get feedback on how Network industry is evolving there.

Now to start with, following technologies are definitely picking up in some form or shape :

- SDX like (SDN - Software Defined Networks)
- CLOS/ Spine & Leaf Designs
- NFV (Network Function Virtualization)
- Virtualization ( In Areas like Network, Compute & Storage)
- Automation ( Chef, Puppet etc...)
- Scripting & Programming ( Python, Bash, Java etc...)
- Cloud Computing
- Network Visibility 
- Overlay Networks/Tunneling Technologies (VXLAN, NvGRE etc...)
- Network Modeling Methods
- API (Application Program Interface like REST )
- Understanding on Unix & Linux
- Big Data
- Active Active Data Centres
- Machine Learning
- Segment Routing 
- Containers ( Docker...)
- Deep Understanding of Applications Structures/Component & Life Cycle 

And last but not least deep understanding of protocols like TCP, HTTP etc...

But again the impact these may have on current network industry (What they call traditional networking now) may vary by large margin depending upon:

- Which part of the world you are living in
- How IT Industry is driven there and Network Industry in particular
- Which company you work for
- What are your personal/political views
- Company's IT Strategy & Road Maps
- Your current skill set & fears about these new technologies
- Is there any simulation tool to get familiar with these technologies
- What is the maturity level of given technology (RFC ?, New Model ? etc...)
- How you want to manage these solution (Afraid of Open Source ? Multi vendor   blame game ? etc...)
- Do you really have a good business case 

Now there are of course other factors including budget/cost, ROI, How to get your operational staff ready etc...

But hope you get the idea. In the coming series of posts I would express my personal opinions around all these but those articles are going to be semi technical rather being completely technical since I am more of a Pre-Sales guy now.

Deepak Arora