Using real-time analytics to dynamically assure network services

Tue, 8th Dec 2015

FYI, this story is more than a year old

What do a Grammy award-winning musician's newest music video and a home security system have in common? When one explodes in popularity, the other might suffer outages, delays and technical faults.

The first music video for Adele's upcoming album, "25," was released on Friday, 23 October 2015 — and the song, called "Hello," was watched 69,034,918 times on YouTube in its first three days. The first 24 hours alone saw 27.7 million views, making it the most-viewed music video ever.

"Hello" was watched in homes, in offices, on WiFi, and on mobile devices using cellular data. Call it the "Adele Hello Effect": Unlike scheduled events like World Cup soccer, the huge swell in traffic was unexpected, and could easily have overloaded some cellular access points, cloud providers, and even backhauls, creating packet drops, connection drops, jitter, delay and, well, a big mess.

Consider what this "Adele Hello" effect might have on a hypothetical customer who relies upon value-added home video monitoring services provided by his Internet service provider, such the United States' Comcast or AT-T, or Canada's Rogers, using DSL, cable, or even over-the-air 4G or LTE.

Video captures from cameras could be delayed; connections could time out; buffers could overflow; video resolution could be stepped down on the delivery of real-time streams. In other words, service delivery could suffer on account of issues from the last mile to the network core and everything in-between.

Note that services aren't failing, either for the overloaded delivery of Adele's music video or other traffic such as the home video security system. This isn't a cable cut. If there's a red light/green light dashboard, the light would stay green.

Rather, the services are degrading due to a lack of adequate resources over a short period of time, with service degradation not triggered by a single lack of service delivery at one customer location, but rather by a large number of events distributed across some larger part of the network.

Such errors are hard to spot, and even when noticed, it's difficult to isolate the cause and determine exactly what to fix. Sure, the network operator may need to add more resources… but which resources, how many, and where?

Unlike with classic red-light fault identification and remediation, these more subtle errors require a different approach, focusing on data gathering across the network, enabled by NFV over an SDN network, and then Big Data analytics to discover, in real time, what's going wrong.

The goal is to assure reliable service delivery by instrumenting the network using NFV, and gathering a lot of data. Big data analysis not only alerts the service provider of real-time problems with actionable intelligence, but then enables the network operations staff to make instant decisions about the proper reaction across a service path that includes both physical and virtual network functions.

This isn't pie-in-the-sky theory. Technology providers are working today on pilot implementations with service providers around the globe, implementing data gathering, Big Data analytics and actionable intelligence on the production networks.

These projects meet service providers' operational and technical goals of migrating to NFV for many network functions, while efficiently managing end-to-end connectivity. In those carriers, VNFs are being deployed throughout the networks, and this enables those carriers to provide not only new value-added functions, but also react more quickly to problems.

At this stage in the technology's evolution, the proper remediation of issues like the "Adele Hello Effect" is determined by the operations staff based on the actionable intelligence. Carriers are working toward the goal, however, of allowing networks to have automatic remediation — including the provisioning of additional resources — in the near future.

What type of remediations? In some cases, the rapid provisioning of parallel circuits or higher-bandwidth circuits to route heavy traffic through a hot spot. In other cases, creating bypasses to route traffic away from hotspots. Another is to add more processing power, such as to spin up additional instances of VNFs to process certain types of data, such as packet filtering or data compression.

All of those options are considered and simulated by the actionable intelligence software, which presents them as suggestions to the network operations staff. It's not enough to point out a problem; actionable intelligence offers advice as to how to fix it.

It's important to realise that Big Data analytics and actionable intelligence need to understand multiple layers in the stack – and speak to different operations staff, even within the same service provider.

In the home security scenario above, there's Layer 2/3 connectivity, handled by one NOC. This ensures that the network connection stays up, and is operating within policy-driven parameters. In other words, our home user still has his Internet.

Meanwhile, there's Layer 7 application support, in this case for home monitoring via video cameras, detecting failures of video feeds, or detecting events that shouldn't happen within those video feeds. That's an entirely different NOC, with different operations teams with different expertise.

Degradation of the video security monitoring service may have totally different indications – and remediation processes – than for basic Internet connectivity, especially if the Big Data analytics is gathering data not only for this one customer, but for all customers in that neighborhood, in that city, or across the entire service provider network.

This isn't red light/green light. Big Data analytics isn't required to spot service failures; that technology exists and service providers have solved those problems and know how to solve them.

This is even true in hybrid networks which combine both virtual and physical network functions — and let's be honest and admit that no matter how attractive SDN/NFV are, many or most carriers will operate hybrid networks for years to come.

The next step to improve carrier competitiveness — and customer satisfaction — is to identify more subtle network connectivity and application problems in real time, determine how to address them, and then allocate resources to solve the problem.

Those problems will only be identified through network instrumentation through NFV, data gathering, Big Data analytics, and the delivery of actionable intelligence to network operators. With trials like those being done today, the next step in the evolution of networks is already here.

Share on: