Modern applications of high-performance computing point to an exponential increase in computing nodes and decrease in memory bandwidth. Such shortages are already observed for I/O and internode communication. This trend indicates that the upper bound of future computation performance will be dictated to a great extent by the amount of data movement. One approach to alleviate this is based on using data compression. For scientific applications that mostly work with enormous arrays of floating-point numbers, lossy compression has been widely accepted as a good solution such as in computer graphics. ZFP is a versatile lossy compression/decompression algorithm that offers efficient and accurate compression scheme for multidimensional floating-point arrays by partitioning the data into blocks of 4d values (e.g. 4 4 4 values in 3D). ZFP utilizes a new and efficient orthogonal block transform to achieve fast compression and decompression for high-precision, numerical data. In this work, we propose a hardware implementation of ZFP compression/decompression algorithm. ZFP works in different modes to suit the required criteria such as fixed-rate, fixed-precision and fixed-accuracy, which will be tailored to utilize hardware flexibility and efficiency. Field-programmable gate arrays will be used for the implementation. FPGAs have been widely used for acceleration of compute-intensive applications delivering 100x speedups over CPU-based systems. ZFP reliance on bit manipulation of the data creates a significant window of enhancement when implemented in hardware. ZFP is scalable as it operates independently in blocks of data enabling multiple hardware engines to operate concurrently on different blocks. In addition, its implementation is done in steps allowing it to be fully pipelined resulting in a high throughput. Moreover, offloading the compression to the hardware accelerators increases system throughput. The designed and implemented ZFP compression/decompression engines in FPGA will be interfaced with host CPU computing nodes using Peripheral Component Interconnect express (PCIe). Design characteristics will be evaluated in terms of area, speed, throughput and compression rate. In addition, performance comparison with software implementation of ZFP will be performed.
|Effective start/end date||1/04/20 → 1/10/21|
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.