If NetOps and DevOps data centers are going to function well, they need a digital sandbox • The ledger

0


Sponsored Data center operators are easy to spot, says Jonathon Lundstrom: “They’re the messy-haired people. “

After all, says Lundstrom, director of business development for Nokia’s web organization, “they’re the ones who make sure all the lights stay on, that traffic keeps flowing, and they’re the ones who are most concerned about the issues. risks. management. “For data center network operators, stability is paramount. But designing a network and data center infrastructure for stability from the start is only the first challenge. Maintaining that stability in operations Everyday life is another, and then there’s the challenge of troubleshooting and correcting when something inevitably goes wrong, ideally without making things worse in the process.

An added complication for infrastructure teams over the past decade has been the application development revolution operated by DevOps, and how this affects the way organizations manage compute, storage and – to a lesser extent until now. ‘now – the network infrastructure.

“DevOps methodologies assume that the network is flexible and can change quickly, that it is easily consumable,” explains Lundstrom. “But unfortunately, they leave it to the network operations organization to manage all of these risks.” Part of the answer is for network teams to follow the lead of the development team and rely on more automation. But in turn, they need to be sure that these automations are consistent and that they maintain network integrity, uptime, and mean time to repair.

This leaves the network teams at an impasse. How can they deliver the agility and automation that modern businesses need, while still providing stability, while recovering quickly when problems arise?

The trivial answer is more resources for exhaustive testing and validation. But as Lundstrom explains, the traditional approach to testing and validating the infrastructure setup is to install in physical labs – in fact, microcosms of the live network. But this presents its own challenges. It is resource intensive, in terms of purchasing equipment, but also in terms of time spent maintaining environments. And that’s before you even factor in the staff time spent designing networks, validating functionality, and troubleshooting those systems.

Additionally, whatever the initial lab network setup, the risk is that it will drift away from the actual network setup over time, increasing the risk of new issues and making them even more difficult to resolve.

Meet your digital twin

Nokia aims to address these issues with the Fabric service system, which is part of his Data Center Switch Fabric Solution, with its Linux Services Router (Linux SR) NOS and Nokia switching hardware platforms. The Fabric Services system includes a digital sandbox, which is a containerized environment that can produce a digital twin of the planned or live physical network.

As Lundstrom explains, the digital sandbox is “functionally integrated into the Fabric Services system and generates containerized Linux SR instances on demand. It creates the virtual plumbing for a newly designed network, or an emulated replica of the real environment.

nokia_digital_sandbox

The digital sandbox within the Fabric Services system

The code executed in the sandbox is the same as that executed on the switches in the physical environment. This gives NetOps teams the ability to replicate – and test – several different scenarios, including new configurations and applications to assess their performance before full deployment. The time savings this brings can be measured in days and weeks, Lundstrom explains. Lundstrom builds on the sandbox “The setup is the same, and we can not only simulate, but also emulate both the control plane and the network data plane so you can see exactly what the expected and actual behaviors will be. in this digital. sandbox before going into production. This capability allows NetOps teams to gain confidence and know what to expect, thereby reducing the level of risk when changing the configuration of the production network.

Nokia’s entire Fabric Services approach is designed to be open – customers can choose from a range of Nokia hardware platforms today, but SR Linux is designed to support white label platforms and could do so. do in the future. Customers can also choose which protocols they wish to operate, use the tools of their choice or even develop their own network applications, which SR Linux will manage and make available to all users in the same way that it manages its own. applications. This philosophy is, necessarily, replicated with the digital sandbox.

“If customers have physical devices from other vendors, we can also connect them to the sandbox,” Lundstrom explains. “With a little plumbing between the server the sandbox is running on and the physical devices in their lab environments, live interoperability can also be validated and tested. In addition, external information sources such as BGP peers, route reflectors or traffic generators can also be connected to the sandbox.

The “Linux SR devices” in the sandbox can also be fully managed by any other device, Lundstrom explains. “So scripting and Ansible tools can take advantage of the API of sandbox-generated nodes. It’s like creating a development environment for your orchestration team, right there in a containerized, virtualized environment… for us, it’s a complement to existing orchestration test environments a customer has. The sandbox simply gives NetOps teams an easy place to play.

As part of testing and validating new networks, network teams want to know what to expect if something goes wrong, but with traditional approaches, even subtle differences between lab and live setups can lead to behavior. different. By having an exact digital twin, he says, “even things like upgrades and failure testing can be predictable in both environments. “The deployment of the digital sandbox is accompanied by” Nokia Certified Designs, “says Lundstrom:” These are the best practices integrated for the creation of a data center network infrastructure, as well as overlay services in it. data center. “

All of this is designed to fuel the desire for consistency of those in charge of operations. Every configuration change per node different from the “golden” configuration represents a risk, he says. “I could implement something on this node that affects not only the client I’m trying to fix or trying to configure, but other clients as well. And the impact of these can only become apparent later when something goes wrong. To reduce this possibility, the Fabric Services System continuously monitors when and where changes occur, and monitors for variations from the intent of the overlay or underlayer services on each node.

It is a question of consistency. Again

While there is an obvious benefit for the initial network design or for testing proposed configuration changes, the sandbox should also give operators a break from managing Day 2+ and troubleshooting in the process. direct. “There are always flashing red lights, the key is to sort out the most important ones from the unimportant ones,” says Lundstrom. But by instantly triggering a digital twin of the live network, “you have a safe place to go and investigate without risking further trouble.”

Once a problem has been identified, he continues, the root cause can be isolated and fixes tested, in the digital sandbox, again without the risk of isolating other customers. The Fabric Services system automates the introduction of patches into the production environment in a manner similar to the principles of continuous development, and the digital sandbox facilitates post-deployment validation as it is a working reference. expected behavior.

While the idea of ​​containerizing a set of network operating systems may not be entirely unique to Nokia, Lundstrom says, “What other people don’t do is orchestrate the environment. emulation, the ability to press a button and pair your physical network. in a virtualized environment and automatically configure it in the same way, with the same production configuration. The same connections, the same control plane and a reduced data plane.

Today, the sandbox can emulate a network of 500 nodes, which could easily represent a data center containing 20,000 servers. The extent to which customers scale their own sandbox will depend on the amount of compute they make available. “But the amount of physical equipment we would replace with this virtualized environment could be staggering,” says Lundstrom. “And that’s both a saving in CapEx for the operator, as well as the operating time required to do all this reconfiguration, this validation of new functionality or this troubleshooting exercise. “

But if saving money on capital expenditure is one thing, he says, the time savings from not having to physically build test, setup, and validation setups is potentially much greater. . “We’re talking about going from days and weeks… to minutes and hours,” he says.

By using the Fabric Services system to provide automation and its digital sandbox to provide an emulated environment, NetOps teams can reduce risk and improve the metrics on which they are judged, such as uptime, mean time to repair. and performance. And they can keep pace with the growing needs of their DevOps colleagues for speed and flexibility.

There is no doubt that NetOps teams can use the time saved to perform even more exhaustive testing, as well as to tackle new strategic challenges, work more closely with their DevOps colleagues, and optimize the network for new workloads. And if they’re really lucky, maybe they can take the time to finally fix that tousled hair.

Sponsored by Nokia.


Share.

Leave A Reply