Waferscale Network Switches
Published in ISCA-51st, 2024
In spite of being a key determinant of latency, cost, power, space, and capability of modern computer systems, network switch radix has not seen much growth over the years due to poor scaling of off-chip IO pitches and switch die sizes. We consider waferscale integration (WSI) as a way to increase the size of the switch substrate to be much bigger than a single die and ask the question: can we use WSI to enable network switches that have dramatically higher radix than today’s switches? We show that while a waferscale network switch can support up to 32x higher radix than state-of-the-art network switches when only area constraints are considered, the actual radix of a waferscale network switch is not area-limited. Rather, it is limited by a combination of internal bandwidth, external bandwidth, and power density. In fact, without optimizations, benefits of a waferscale network switch are minimal. To address the scalability bottlenecks, we propose a heterogeneous network switch design that reduces switch power by 30.8%-33.5% which, in turn, allows an increase in radix (by up to 4x) by increasing internal I/O bandwidth at the expense of energy efficiency. We also propose subswitch deradixing that increases the overall radix by 2x by decreasing the radix of the subswitches to alleviate the internal I/O bottleneck. We use Area I/O and Optical I/O schemes to alleviate the external I/O bandwidth bottlenecks of conventional SerDes-based external connectivity. In addition to scalability optimization, we present optimizations such as low latency buffering and proprietary routing that improve the performance of waferscale switches. Finally, we present a system architecture for a waferscale network switch that supports its port count, power delivery, and cooling requirements in a compact form factor. We show that the switch can be used to enable new computing systems such as single-switch datacenters and massive-scale singular GPUs. It can also lead to a dramatic reduction in datacenter network costs. Overall, this is the first work quantifying the benefits of waferscale switches and identifying and addressing the unique challenges and opportunities in building them.
Recommended citation: S. Chen, S. Pal and R. Kumar, "Waferscale Network Switches," 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), Buenos Aires, Argentina, 2024, pp. 215-229, doi: 10.1109/ISCA59077.2024.00025. https://davidchen.page/files/Waferscale-Network-Switches.pdf