Google has shared details about its supercomputing infrastructure that uses optical interconnects between servers and chips to boost performance and energy efficiency. In a recently published research paper, the company explained that it had deployed its TPU v4 supercomputer with 4,096 of its Tensor Processing Units (TPUs), each capable of powering AI applications, including the company's AI-infused search engine, Bard. The supercomputer, which Google says is the first with a circuit-switched optical interconnect, has 64 racks hosting 4,096 TPUs and 48 optical circuit switches, with each rack capable of independent deployment. The optical interconnect and its high bandwidth enable each rack to be connected at once after production is completed, offering cost savings and improved flexibility in deployment. The company said the optical components account for less than 5% of the system cost and less than 2% of the system's power consumption.
Google's researchers explained that the optical circuit switching (OCS) used in the system was a next-generation interconnect in comparison with Nvidia's NVSwitch, which is developed on electricals. The researchers said the optical switches are "fibers connected by mirrors, so any bandwidth running through a fiber can be switched between input and output fibers by the OCS across 4,096 chips today." The interconnect will be accessible to more TPU cores and can set several terabits per second.
The TPU v4 chip outperforms its predecessor, the TPU v3, by 2.1 times and enhances the performance per watt by 2.7 times. Google stated that its chip could exceed Nvidia's A100 chip and an AI chip from Graphcore, as the TPU v4 chip made better use of computing resources in reality. The TPU v4 supercomputer involves SparseCores, an intermediary chip that is near to high-bandwidth memory, where much of the AI crunching occurs. The idea of SparseCores supports an emerging computing architecture being examined by AMD, Intel, and Qualcomm, which resists computing coming closer to data and managing between data migration in and out of memory.
Optical connections have been utilized for long-distance dialogs over telecom networks for decades-long, but now they are determined as more “mature” for utilization over shorter distances in data centers. Companies such as Broadcom and Ayar Labs are building products for optical interconnects.
Summing up, optical interconnects are expected to play a significant role in achieving zettascale computing in an energy-efficient way, which is crucial for the future of artificial intelligence and high-performance computing.
We suggest you read our recent report about The exciting future of computing with Apple’s M-series chips.
Also, we wrote about 5G IoT connections that will exceed 100M by 2026 due to estimates.
You can check our previous news such as the Nvidia AI supercomputer that was determined as an “inflection point” in the technology industry.