Abstract
To alleviate the communication bottleneck of distributed deep learning training, several data compression algorithms have been proposed. However, these algorithms introduce computational overhead and resource allocation concerns on CPUs and GPUs. In this paper, we introduce SqueezeNIC, an FPGA-based Network Interface Card (NIC) that offloads communication compression from CPUs/GPUs, bridging a high bandwidth intra-node network with a high bandwidth inter-node network. It enables better overlap of gradient communication and computation to further reduce training time per iteration in distributed training. Our evaluations shows that SqueezeNIC achieves line rate compression and can speed up training by up to a factor of 1.21×, compared to baseline approaches.
| Original language | English |
|---|---|
| Title of host publication | NAIC 2024 - Proceedings of the 2024 SIGCOMM Workshop on Networks for AI Computing |
| Publisher | Association for Computing Machinery, Inc |
| Pages | 61-68 |
| Number of pages | 8 |
| ISBN (Electronic) | 9798400707131 |
| DOIs | |
| State | Published - 4 Aug 2024 |
| Externally published | Yes |
| Event | 1st Workshop on Networks for AI Computing, NAIC 2024, at ACM SIGCOMM 2024 - Sydney, Australia Duration: 4 Aug 2024 → 8 Aug 2024 |
Publication series
| Name | NAIC 2024 - Proceedings of the 2024 SIGCOMM Workshop on Networks for AI Computing |
|---|
Conference
| Conference | 1st Workshop on Networks for AI Computing, NAIC 2024, at ACM SIGCOMM 2024 |
|---|---|
| Country/Territory | Australia |
| City | Sydney |
| Period | 4/08/24 → 8/08/24 |
Bibliographical note
Publisher Copyright:© 2024 Owner/Author.
Keywords
- Distributed Training
- FPGA
- In-Network Compression
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Networks and Communications
- Computer Science Applications
- Information Systems
- Signal Processing