Abstract
Recently, accelerator-based compression/decompression was proposed to hide the storage latency of high-performance computing (HPC) applications that generate/ingest large data that cannot fit a node's memory. In this work, such a scheme has been implemented using a novel FPGA-based lossy compression/decompression scheme that has very low-latency. The proposed scheme completely overlaps the movement of the application's data with its compute kernels on the CPU with minimal impact on these kernels. Experiments showed that it can yield performance levels on-par with utilizing memory-only storage buffers, even though data is actually stored on disk. Experiments also showed that compared to CPU- and GPU-based compression frameworks, it achieves better performance levels at a fraction of the power consumption.
| Original language | English |
|---|---|
| Article number | 104955 |
| Journal | Journal of Parallel and Distributed Computing |
| Volume | 193 |
| DOIs | |
| State | Published - Nov 2024 |
Bibliographical note
Publisher Copyright:© 2024 Elsevier Inc.
Keywords
- Data compression
- FPGA accelerators
- Hardware co-design
- High performance computing
- Memory intensive applications
- Reconfigurable computing
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Hardware and Architecture
- Computer Networks and Communications
- Artificial Intelligence
Fingerprint
Dive into the research topics of 'Accelerating memory and I/O intensive HPC applications using hardware compression'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver