SqueezeNIC: Low-Latency In-NIC Compression for Distributed Deep Learning

  • Achref Rebai
  • , Mubarak Adetunji Ojewale
  • , Anees Ullah
  • , Marco Canini
  • , Suhaib A. Fahmy

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

To alleviate the communication bottleneck of distributed deep learning training, several data compression algorithms have been proposed. However, these algorithms introduce computational overhead and resource allocation concerns on CPUs and GPUs. In this paper, we introduce SqueezeNIC, an FPGA-based Network Interface Card (NIC) that offloads communication compression from CPUs/GPUs, bridging a high bandwidth intra-node network with a high bandwidth inter-node network. It enables better overlap of gradient communication and computation to further reduce training time per iteration in distributed training. Our evaluations shows that SqueezeNIC achieves line rate compression and can speed up training by up to a factor of 1.21×, compared to baseline approaches.

Original languageEnglish
Title of host publicationNAIC 2024 - Proceedings of the 2024 SIGCOMM Workshop on Networks for AI Computing
PublisherAssociation for Computing Machinery, Inc
Pages61-68
Number of pages8
ISBN (Electronic)9798400707131
DOIs
StatePublished - 4 Aug 2024
Externally publishedYes
Event1st Workshop on Networks for AI Computing, NAIC 2024, at ACM SIGCOMM 2024 - Sydney, Australia
Duration: 4 Aug 20248 Aug 2024

Publication series

NameNAIC 2024 - Proceedings of the 2024 SIGCOMM Workshop on Networks for AI Computing

Conference

Conference1st Workshop on Networks for AI Computing, NAIC 2024, at ACM SIGCOMM 2024
Country/TerritoryAustralia
CitySydney
Period4/08/248/08/24

Bibliographical note

Publisher Copyright:
© 2024 Owner/Author.

Keywords

  • Distributed Training
  • FPGA
  • In-Network Compression

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'SqueezeNIC: Low-Latency In-NIC Compression for Distributed Deep Learning'. Together they form a unique fingerprint.

Cite this