How and why to adopt SDN despite its dark side

Five enterprise network operations managers told me they were very concerned about recent cloud outages. Why? Because every one of the outages were caused by network problems. Four of the five managers admitted that in their own containerized data centers, their problems came more often from networks than from servers. Why is this happening? Answer, according to enterprises: more isn’t better.

Complexity is the enemy of efficient operations and management. The sheer volume of things going on can swamp management centers and even management tools. If you add in multiple vendors and multiple technologies that create differences in operations practices, you get something very messy. But it’s more than just size or technology scope that’s making network operations complicated, it’s the way networks work.

IP networks were designed to be adaptive. Every router is an island that shouts out its identity and status over whatever trunks are available, and every router listens to other shouts. From all this shouting, the routers collect the state and topology of the network, and from that information they build routing tables. Periodic shouts keep the tables up to date…sort of.

If Router A shouts out on a trunk to Router B, that router has to repeat its state/topology advertising to its adjacent routers. Any change in conditions anywhere, including a seemingly minor change in configuration, has to be propagated through relayed shouts. That takes time, a period called “convergence” where all our routers are singing (or shouting) the same tune. During that convergence, packets can be delayed or even lost, but that’s not the real problem. It’s hard to engineer how you want the network to operate, both under normal and failure-mode conditions when everything depends on shout relays. MPLS can help here (traffic engineering is why it was invented), but the real answer may be software-defined networks, or SDN. Unfortunately, we’ve messed SDN up.

The SDN upside: Control

If you ask a hundred network operations people what SDN is about, they’ll tell you it’s to “separate the control plane and the data plane”.  If there’s ever been a more useless definition of a network concept, it doesn’t come to mind. What the heck does that mean, and why would you care? What SDN is really about is substituting planned route development for adaptive development. The control-plane stuff comes in only because all that router shouting takes place not with data packets but with control packets.

In SDN networks, a controller maintains the topology of the network. That doesn’t take much effort because, after all, routers and trunks don’t float around much. The controller keeps up to date on the state of the network elements, and it has policies that decide how routers and trunks should be stitched together to create routes. It sends the routing tables to the “routers” (in the Open Network Foundation model, using the OpenFlow protocol).  It’s like an old-fashioned Mother May I? game; everyone does what they’re told.

SDN networks have absolute control over routes and route changes. They allow operations to engineer alternative routing topologies based on failure analysis or can calculate them based on policies, but either way, every device gets the same tune shouted from that central controller. No convergence period with a bunch of inconsistent routes, no confusion. Because you can examine these alternative failure-mode topologies carefully before you commit them, there are no surprises.

If we asked our hundred netops types to line up on the left if they loved SDN and on the right if they didn’t, most wouldn’t know where to go; that’s been true for the decade-and-a-half that SDN has been around.  But some people did know where they lined up on the issue. Almost as soon as SDN was suggested Google started looking at adopting SDN in its backbone network, and it did just that. How, given that Google has to interwork with the internet for everything it does?  It surrounded its SDN core with a series of “BGP processes” that made its SDN network look like an IP network.  SDN in a BGP-paper-bag, or in modern terms, a black box or SDN-based “IP-intent model”.

SDN in general, and the Google example in particular, illustrate two ways of reducing network complexity.  First, rely on deterministic behavior to control routes. In an adaptive network, you don’t really know what all that shouting is going to converge on until it’s done. In SDN, one caller is calling the square dance—the SDN controller.  Second, a hierarchical structure can reduce complexity in itself.  If the internet, which contains hundreds of thousands of routers, were a flat network linked throughout by our router shouts, nothing would ever get through. Instead, it’s broken up into segments (autonomous systems or ASs) that first route between segments and second within them, with the latter process being based on the shouting routers.

Since most enterprises use IP VPNs for the WAN, their network-building is likely focused on the data center and the connection between the data center and the VPN. This is a great spot for SDN, because data-center configuration is critical and because there’s little chance that a network failure would completely cut off the SDN controller from some devices, which would make controlling the network difficult.  A company with multiple data centers can create an SDN segment for each and link them via SDN or via traditional routing.

Adopting SDN might not be expensive

Your company probably doesn’t use SDN at all, but before you join other netops people bending an elbow and singing of lost opportunities, take heart.  You can do all this today, with products already available and in many cases installed.

Most router and switch vendors support SDN on their devices; look for OpenFlow as a supported protocol and review exactly what features are offered. SDN controllers are also readily available from the major router vendors and a dozen other sources. Before you make a commitment, be sure that you understand how to use a given controller’s features to create the policies that define both the “normal” state of routes and any failure modes you need. Also be sure that your controller supports full journaling of all activity, both network status changes and changes made by administrators to the controllers’ operating parameters.

Why all the concern about policies and journals? Because of SDN’s dark side. You can make the network do exactly what you ask, and that means that you can make it do truly bad things. SDN doesn’t eliminate network errors, it just lets you insert more explicit planning and control into the configuration process. If your SDN controller doesn’t protect you from yourself, doesn’t let you validate routes, and establish policies, and consult journals when something goes wrong, it can take you to a very dark place.

SDN is a way for you to truly control your network, not just watch it while it tries to control itself. If that sounds good, it’s time to revisit the concept. Just be aware that human error is still the biggest source of network problems, and make sure your humans and your SDN are coexisting and cooperating.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2022 IDG Communications, Inc.