Why the AI era is forcing a redesign of the entire compute backbone

Do you want smarter insights into your inbox? Register for our weekly newsletters to get only what is important for Enterprise AI, data and security leaders. Subscribe now
In recent decades, almost unimaginable claims in calculation performance and efficiency have seen, made possible by Moore’s law and supported by scale -differing raw material hardware and loosely linked software. This architecture has provided online services worldwide to billions and almost all human knowledge within reach.
But the next revolution will demand much more. Fulfilling the promise of AI requires a step -by -step change in possibilities that exceed the progress of the internet age. To achieve this, we as an industry must revise some of the basics that the earlier transformation has applied and collectively innovate to reconsider the entire technological stack. Let us explore the forces that control this revolution and draw up what this architecture should look like.
From raw material hardware to specialized calculation
For decades, the dominant trend in computer use has been the democratization of calculating through scale -differing architectures built on almost identical, raw material servers. This uniformity led to the flexible placement of workload and efficient use of resources. The requirements of Gen AI, strongly dependent on predictable mathematical operations on massive data sets, reverse this trend.
We are now witnessing a decisive shift to specialized hardware-including ASICs, GPUs and Tensor Processing units (TPUs)-those orders of size improvements in performance per dollar and per watt deliver compared to general CPUs. This proliferation of domain -specific accounting units, optimized for narrower tasks, will be crucial for stimulating continuous rapid progress in AI.
Beyond Ethernet: The Rise of Specialized Interconnects
These specialized systems often require “all-to-all” communication, with terabit-per-second bandwidth and nanosecond latencies that approach local memory speeds. Today’s networks, largely based on raw materials Ethernet switches and TCP/IP protocols, are poorly equipped to meet these extreme requirements.
As a result, to scale Gen Ai -Workloads about huge clusters of specialized accelerators, we see the rise of specialized interconnects, such as ICI for TPUs and NVlink for GPUs. These specially built networks give priority to direct memory-to-memory transfers and use special hardware to speed up the sharing of information between processors, so that the overhead of traditional, layered network stacks is effectively bypassed.
This step to tightly integrated, calculation -oriented networks will be essential to overcome bottlenecks and to efficiently scale the next generation AI.
Break the memory wall
For decades, the performance buyers in the calculation have surpassed the growth in the memory bandwidth. Although techniques such as caching and stacked SRAM have partially limited this, the data -intensive nature of AI only worsens the problem.
The insatiable need to feed more and more powerful units of account has led to a high bandwidth memory (HBM), which stacks Dram directly on the processor package to stimulate the bandwidth and reduce the latency. However, even HBM is confronted with fundamental limitations: the physical chip perimeter limits the total data flow and the relocation of massive data sets at Terabit speeds creates considerable energy restrictions.
These limitations emphasize the crucial need for connectivity with higher bandwidth and underline the urgency for breakthroughs in processing and memory architecture. Without these innovations, our powerful areas of calculation will be inactive for data, which dramatically limits the efficiency and the scale.
From server farms to high density systems
The current models for Advanced Machine Learning (ML) are often based on carefully orchestrated calculations over dozens of up to hundreds of thousands of identical calculations, which consumes enormous strength. This tight coupling and fine -grained synchronization at the microsecond level sets new requirements. Unlike systems that embrace heterogeneity, ML calculations require homogeneous elements; Mixing generations would be bottlenecks faster units. Communication paths must also be planned in advance and very efficient, because delays in a single element can block a whole process.
These extreme requirements for coordination and power are the need for unprecedented calculation. Minimizing the physical distance between processors becomes essential to reduce the latency and power consumption, so that the road is cleared for a new class ultra-tight AI systems.
This urge for extreme density and closely coordinated calculation fundamentally changes the optimal design for infrastructure, which requires a radical reconsideration of physical layouts and dynamic energy management requires to prevent bottlenecks and maximize efficiency.
A new approach to fault tolerance
Traditional error tolerance is based on redundancy with loosely connected systems to reach high uptime. ML Computing requires a different approach.
Firstly, the enormous scale of calculation makes over promotion too expensive. Secondly, model training is a sleek synchronized process, where a single malfunction can flow to thousands of processors. Finally, advanced ML -hardware often pushes to the limit of current technology, which may lead to higher failure rates.
Instead, the emerging strategy frequent checkpointing calculation situation includes storing-in combination with real-time monitoring, rapid allocation of reserve sources and fast restart. The underlying hardware and network design must enable rapid failure detection and seamless replacement of components to maintain the performance.
A more sustainable approach to power
Today and looking, access to electricity is a key bottleneck for scaling AI Compute. Although the traditional system design focuses on maximum performance per chip, we have to shift to an end-to-end design aimed at performance delivered on scales per watt. This approach is vital because it takes into account all system components – calculations, network, memory, electricity delivery, cooling and fault tolerance – collaborate seamlessly to maintain performance. Optimizing components in isolation seriously limits overall system efficiency.
As we insist on greater performance, individual chips require more power, which often exceed the cooling capacity of traditional air -cooled data centers. This requires a shift to more energy-intensive, but ultimately more efficient, liquid cooling solutions and a fundamental redesign of data center cooling infrastructure.
In addition to cooling, conventional superfluous power sources, such as double nuts feeds and diesel generators, substantial financial costs and delivery of slow capacity. Instead, we have to combine various power sources and storage on a multi-gigawatt scale, managed by real-time Microgrid controllers. By using AI workload flexibility and geographical distribution, we can deliver more options without expensive back -up systems that are only needed a few hours a year.
This evolving energy model makes real-time response to the availability of electricity possible of the closing of calculations during shortages to advanced techniques such as frequency scaling for workloads that can tolerate reduced performance. All this requires real -time telemetry and operation at levels that are currently not available.
Security and privacy: ingrained, not screwed
A critical lesson from the internet age is that security and privacy cannot be effectively screwed on to an existing architecture. Threats of bad actors will only become more advanced, which requires protection for user data and own intellectual property to be built into the structure of the ML infrastructure. An important observation is that AI will ultimately improve the attackers’ opportunities. This in turn means that we have to ensure that AI supercharge our defenses at the same time.
This includes end-to-end data coding, robust tracking for data distance with verifiable access logs, hardware-forced security boundaries to protect sensitive calculations and advanced key management systems. Integrating these guarantees of the land is essential for protecting users and maintaining their trust. Real-time monitoring of what is likely to be Petabits/SEC of telemetry and logging is the key to identifying and neutralizing needle-in-the-haystack attack vectors, including those of insider threats.
Speed as a strategic imperative
The rhythm of hardware upgrades is dramatically shifted. In contrast to the incremented rack-rack evolution of the traditional infrastructure, the implementation of ML super computers requires a fundamentally different approach. This is because ML Compute is not easily executed on heterogeneous implementations; The calculation code, algorithms and compiler must be specifically tailored to every new hardware generation to fully utilize the possibilities. The speed of innovation is also unprecedented and often delivers a factor of two or more in performance year after year from new hardware.
That is why instead of incremental upgrades, a huge and simultaneous rollout of homogeneous hardware, often in entire data centers, is required. With annual hardware, the supply of honest-factor performance improvements renews, the possibility to quickly stand up these colossal AI engines.
The aim must be to compress timelines from design to fully operational more than 100,000 chip implementations, which makes improvements for efficiency possible, while algorithmic breakthroughs are supported. This requires radical acceleration and automation of each phase and demands a production -like model for these infrastructures. From architecture to monitoring and repair, every step must be streamlined and automated to use any hardware generation on an unprecedented scale.
Meeting the moment: a collective effort for the next generation AI infrastructure
The rise of Gen AI not only marks an evolution, but a revolution that requires a radical reinvestion of our computer infrastructure. The challenges that lie for us – in specialized hardware, interconnected networks and sustainable activities – are considerable, but also the transforming potential of the AI that makes it possible.
It is easy to see that our resulting calculation infrastructure will be unrecognizable in the coming years, which means that we cannot easily improve the blueprints that we have designed. Instead, from research to industry, we have to start with an attempt to re -examine the requirements of AI Compute from first principles, to build a new blueprint for the underlying global infrastructure. This will in turn lead to fundamental new possibilities, from medicine to education to matters, on an unprecedented scale and efficiency.
Amin Vahdat is VP and GM for Machine Learning, Systems and Cloud AI Google Cloud.
Source link




