Elad Nafshi, senior vice president for next-generation access networks at Comcast Xfinity, said in an interview with VentureBeat that the nation’s internet network has held up during the surge of residential internet traffic from people working at home. But this success wasn’t just because of capital spending on fiber-optic networks. Rather, it has depended on a suite of AI and machine-learning software that gives the company visibility into its network, adds capacity quickly when needed, and fixes problems before humans notice them.
Comcast’s network is accessible to more than 59 million U.S. homes via 800,000 miles of cable (about 3 times the distance to the moon). Back in March, Comcast said internet traffic had risen 32% because of COVID-19 but assured everyone it had the capacity to handle peak traffic demands in the U.S. The company also saw a 36% increase in mobile data use over Wi-Fi on Xfinity mobile.
“The first part of the growth was because of work from home,” Jan Hofmeyr, chief network officer at the Comcast Technology Center in Philadelphia, said in an interview with VentureBeat. “Things like video conferencing started to drive a lot of traffic. The consumption of video went up significantly. And then with kids being home, you could see playing games going upward. We saw it go up across the board.”
But since March and April, the traffic from Comcast’s 21 million subscribers has hit a plateau. People are getting out of their homes more and the initial surge of work-from-home has normalized, Hofmeyr said.
The company normally adds capacity 12 to 18 months ahead of time, with typical plans targeting 45% a year increases in traffic. Since 2017, Comcast has invested $12 billion in the network and added 33,331 new route miles of fiber optic cable. Those investments have enabled the company to double capacity every 2.5 years, Hofmeyr said.
“With COVID-19, we obviously saw a massive surge in the network, and looking back in retrospect the network was highly reliable,” Hofmeyr said. “We were able to respond quickly as we saw the spike in traffic. We were able to add capacity without having to take the network down. It was designed for that.”
During the initial stages of the pandemic, the new technologies were able to handle regional surges while internet traffic spiked as much as 60%. Nafshi told VentureBeat the network can’t handle surges just by getting bigger. In March and April, Comcast added 35 terabits per second of peak capacity to regional networks. But the company also added 1,700 100-gigabit links to the core network, compared to 500 in the same months a year earlier. And its software, called Comcast Octave, manages traffic complexity, working behind the scenes where customers don’t notice it.
Comcast Octave AI
The AI platform was developed by Comcast engineers in Philadelphia. It checks 4,000-plus telemetry data points (such as external network “noise,” power levels, and other technical issues that can add up to a big impact on performance) on more than 50 million modems across the network every 20 minutes. While invisible, the AI and machine learning tech has played a valuable role over the past several months.
“COVID-19 was a very unique experience for us,” said Nafshi. “When you’re building networks, you never build for the situation where everyone gets locked up in their room in their homes and suddenly they jump online. Now, that’s the new normal. The challenge we are presented with is how to enable our customers to shelter in place and work and be entertained.”
Octave is programmed to detect when modems aren’t using all the bandwidth available to them as efficiently as possible. Then it automatically adjusts them, delivering substantial increases in speed and capacity. Octave is a new technology, so when COVID-19 hit, Comcast had only rolled it out to part of the network.
To meet the sudden demand, a team of about 25 Octave engineers worked seven-day weeks to reduce the deployment process from months to weeks. As a result, customers experienced a nearly 36% increase in capacity just as they were using more bandwidth than ever before for working, streaming, gaming, and videoconferencing.
“We’ve had a fair amount of experience already looking at data patterns and acting on it,” Nafshi said. “We had an interactive platform deployed that we were leaning on. We looked at the data network conditions and decided what knobs we need to turn on our infrastructure in order to really optimize how packets get delivered to the home.”
Comcast took the data it had collected and put it into algorithmic solutions to predict where interference could disrupt networks or trouble points might appear.
“We have to turn the knobs so that we optimize delivery to your house, which would not be the same as the delivery to my home,” Nafshi said. “We provide you with much more reliable service by detecting the patterns that lead up to breakage and then have the network self-heal based on those patterns. We’re making that completely transparent to the customer. The network can self-heal autonomously in a self-feedback loop. It’s a seamless platform for the customer.”
Smart Network Platform
Before introducing Comcast Octave, the company also deployed its Smart Network Platform. Developed by Comcast engineers, this suite of software tools automates core network functions. As a result of this investment, Comcast was able to dramatically cut down the number of outages customers experience and their duration. The outages are now lasting a matter of minutes sometimes, compared to hours before, said Noam Raffaelli, senior vice president of network and communications engineering at Comcast Xfinity, in an interview with VentureBeat.
“We are trying to benefit from innovation on software to basically drive our outcomes and our operational key performance indicators (KPIs) down so things like outage minutes or minutes to repair go down,” said Raffaelli. “We look at data across our network and use data science to understand trends and do correlations between events we see on the network. We have telemetry and automation, so we can operate the equipment without the manual interference of our engineers. We mitigate issues before there is any degradation in the networks.”
On top of that, the equipment is more secure and more automated, Raffaelli said. Comcast has also been able to figure out how to build redundancies into the network so it can hold up in the case of accidents, such as a backhoe operator cutting a fiber-optic cable.
“This gives us an unprecedented real-time view of our network and unprecedented insights into what the customer experience is,” Raffaelli said. “We’ve had a double-digit improvement in outage minutes and repair. We are building redundant links across the network.”
A tool called NetIQ uses machine learning to scan the core network continuously, making thousands of measurements every hour. Before NetIQ, Comcast would often find out about a service-impacting issue like a fiber cut when it started seeing service degradation or getting customer calls.
With NetIQ in place, Comcast can see an outage instantly. The company has reduced the average amount of time it takes to detect a potentially service-impacting issue on the core network from 90 minutes to less than five minutes, which has paid off during COVID-19.
I witnessed some of this firsthand, as I’m a Comcast subscriber. In four months, I’ve had only one outage. I logged into my service account via the phone and got a message saying my area was experiencing an outage that was expected to last for 90 minutes. After that, the network was fixed and I have stayed on it since.
How to improve gaming traffic
Gamers are among the hardest internet users to please, as they want to download a new game as soon as it’s available. They also want low latency, or no interaction delays, which is important in things like multiplayer shooting games like Call of Duty: Warzone, where you don’t want confusion over who pulled a trigger first.
“We are laser-focused on latency across our network. It’s an extremely important metric that we track very closely across the entire network,” Hofmeyr said. “We feel very bullish and very excited about what we are able to deliver from a business perspective. I don’t believe that we have a negative perspective, any impact on gaming from a latency perspective.”
He added, “Gaming is writing two things for us. One is the game downloads are just becoming bigger and bigger. This is very common today that a game download is multi-gig. And when they are released, you see massive expansion and growth in terms of downloads. On the latency side, we continuously invest. We are looking at AI. We are looking at software and tools to help improve it over time.”
Game companies invest in low-latency game servers and improving the connections between specific gamers who are in the same match or the same region so latency doesn’t affect them as much. But infrastructure companies like Comcast can also improve latency.
Content delivery networks are an integral part of making video delivery more efficient. Comcast video is delivered through the company’s own CDNs, which position videos throughout the network so they can be delivered in as short a distance as possible to the viewer. The company constantly monitors peaks in traffic and designs the network for those peaks. Having a lot of people playing a game or watching a video at the same time establishes new peaks. But the 1,700 100-gig links allow the company to deal with those peaks by helping each region deal with peaks in specific parts of the network.
The network of the future
While it’s still early in the process, Comcast is moving to a virtualized, cloud-based network architecture so it can manage accelerating demand and deliver faster, more reliable service. Virtualization means taking functions that were once performed by large, purpose-built pieces of hardware — hardware that required manual upgrades to deliver innovation — and moving them into the cloud.
“Transitioning into web-based software is helping us self-heal much faster and build our capabilities faster,” Nafshi said. “If there is a failure point, you fail at a container level rather than an appliance level, and that greatly reduces the time to repair and mitigate.”
By doing this, Comcast will reduce the innovation cycles on those functions from years down to months. One example of this is the virtual CMTS initiative. (A CMTS is a large piece of hardware that serves an entire neighborhood, delivering traffic between the core network and homes.) Increasingly, Comcast has been making those devices “virtual” by transitioning their functions into software that runs in data centers.
This not only allows Comcast to innovate faster, it also provides two key benefits for customers. First, it allows the firm to introduce much smaller “failure points” into the system, grouping customers into smaller groups so if one part of the network environment experiences an issue, it affects far fewer people. Second, the virtual architecture lets Comcast leverage other AI tools to have far greater visibility into the health of the network and to self-heal issues without human intervention.
Upload speeds increased somewhat during COVID-19, but not nearly as much as downloading did. Uploads are driven by things such as livestreamers, who share their video across a network of fans. In the future, Comcast is promising symmetrical download and upload speeds at 10 gigabits a second. It hasn’t said when that will happen, but Cable Labs, the research arm of the cable industry, is working on the technology.
“It’s something that is very much in development,” Hofmeyr said. “It’s going to be remarkable. We can deploy on top of existing infrastructure by leveraging AI software and the evolving DOCSIS protocol.”