Networking: the more things change, the more they stay the same

The world is full of network protocols, the ways in which electronic equipment and computers communicate. Widespread use of these protocols can be found in data centers, especially hyperscale ones such as Google and Amazon.

Connecting all the servers inside a data center is a high intensity network operation and for decades these carriers have been using the most common communication protocols like Ethernet to physically connect all those servers and switches. (Yes, many others are still in use, but Ethernet makes up the bulk of what we’re talking about). Ethernet has been around for a long time because it’s very flexible, scalable and all. But, like everything related to technology, it has its limitations.

Editor’s note:
Guest Author Jonathan Goldberg is the founder of D2D Advisory, a multifunctional consulting firm. Jonathan has developed growth strategies and alliances for mobile, networking, gaming and software companies.

Earlier this year, Google introduced its new network protocol called aquila in which Google seems to have focused on one big limitation of Ethernet – network latency, or delays caused by the amount of time it takes for things to move over an Ethernet network. True, we measure the delay in millionths of a second (microseconds, microseconds), but on the scale we are talking about, these delays add up.

AT Google words:

We are seeing a new stalemate in the data center, where progress in distributed computing is increasingly limited by the lack of performance predictability and isolation in multitenant data center networks. Performance differences of two to three orders of magnitude in what network designers are aiming for and what applications can expect and program are not uncommon, severely limiting the pace of innovation in distributed systems based on higher-level clusters.

In examining this thick paper, two questions stood out to us.

First, what applications does Google create that are so complex that they require this solution. Google is known for its efforts to commercialize distributed computing solutions. They invented many of the most common tools and concepts for breaking down large, complex computational tasks into small, discrete tasks that can be executed in parallel. But here they seek to centralize it all.

Instead of separate tasks, Google creates applications that are closely related and interdependent so that they need to exchange data in such a way that microseconds slow it all down “by two to three orders of magnitude.”

The answer is most likely “AI neural networks” – that’s a fancy way of saying “matrix multiplication”. A bottleneck that many AI systems run into is that when calculating massive models, the various steps in the math interfere with each other—one calculation needs a solution for another calculation before it can complete.

dig into starter AI chip flour companies, and many of them fail because of such problems. Google needs to have models so large that these interdependencies become bottlenecks that can only be solved at the scale of the data center. A likely candidate here is that Google needs something similar to train autonomous driving models that have to take into account a huge number of interdependent variables. For example, is this object on lidar the same as what the camera sees, how far away is it, and how long before we hit it?

It’s also possible that Google is working on some other very intensive computing. The paper repeatedly discusses high performance computing (HPC), which is typically the domain of supercomputers that simulate weather and nuclear reactions. It’s worth considering what Google is currently working on because it’s likely to be very important somewhere in the future.

That being said, the second factor that stood out for us in this article is a sense of extreme irony.

We found this paragraph in particular very humorous (which definitely says more about our sense of humor than anything else):

Key differences between these more specialized options and data center environments include: i) the ability to deploy with a single tenant, or at least space-sharing rather than time-sharing; ii) reducing anxiety about bounce handling; and iii) a willingness to use backwards incompatible network technologies, including wired formats.

The fact is that Google wanted Aquila to provide dedicated network paths for a single user, ensure guaranteed message delivery and support old communication protocols. There are, of course, a set of networking technologies that already do this – we call them circuit switchingand they have been used by telecom operators for more than a hundred years.

We know that Aquila is very different. But it is worth thinking about how much technology is a pendulum. Circuit switching has given way to Internet packet switching over many painful and grueling decades of transition. And now that we’ve come to the point where Internet protocols power nearly all telecommunications networks, the state of the art is returning to deterministic dedicated channel delivery.

It’s not that Google is regressive, it’s that technologies have trade-offs and different applications require different solutions. Remember that next time someone will be fighting a holy war over one protocol or another.

Google is not alone in finding something like this. Packet switching in general and the Internet in particular have a number of serious limitations. There are dozens of groups around the world seeking to overcome these limitations. These groups range from Internet pioneers such as Professor Tim Berners-Lee and his Solid initiative to Huawei New IP (slightly another kind that).

These projects obviously have very different goals, but they all suffer from the problem of overcoming the global installed base. On the contrary, Google does not seek to change the world, but only part of it.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button