Abstract
Vector accelerators can efficiently execute regular data-parallel workloads, but they require expensive multi-ported register files to feed large vector ALUs. Recent work on in-situ processing-in-SRAM shows promise in enabling area-efficient vector acceleration. This work explores two different approaches to leveraging in-situ processing-in-SRAM: BS-VRAM, which uses bit-serial execution, and BP-VRAM, which uses bit-parallel execution. The two approaches have very different latency vs. throughput trade-offs. BS-VRAM requires more cycles per operation, but is able to execute thousands of operations in parallel, while BP-VRAM requires fewer cycles per operation, but can only execute hundreds of operations in parallel. This paper is the first work to perform a rigorous evaluation of bit-serial vs. bit-parallel in-situ processing-in-SRAM. Our results show that both approaches have similar area overheads. For 32-bit arithmetic operations, BS-VRAM improves throughput by 1.3-5.0× compared to BP-VRAM, while BP-VRAM improves latency by 3.0-23.0× compared to BS-VRAM.
| Original language | English |
|---|---|
| Title of host publication | 2020 IEEE International Symposium on Circuits and Systems, ISCAS 2020 - Proceedings |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| ISBN (Electronic) | 9781728133201 |
| State | Published - 2020 |
| Externally published | Yes |
Publication series
| Name | Proceedings - IEEE International Symposium on Circuits and Systems |
|---|---|
| Volume | 2020-October |
| ISSN (Print) | 0271-4310 |
Bibliographical note
Publisher Copyright:© 2020 IEEE
ASJC Scopus subject areas
- Electrical and Electronic Engineering
Fingerprint
Dive into the research topics of 'Towards a reconfigurable bit-serial/bit-parallel vector accelerator using in-situ processing-in-SRAM'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver