Abstract
Task-based parallel programming frameworks offer compelling productivity and performance benefits for modern chip multi-processors (CMPs). At the same time, CMPs also provide packed-SIMD units to exploit fine-grain data parallelism. Two fundamental challenges make using packed-SIMD units with task-parallel programs particularly difficult: (1) the intra-core parallel abstraction gap; and (2) inefficient execution of irregular tasks. To address these challenges, we propose augmenting CMPs with intra-core loop-task accelerators (LTAs). We introduce a lightweight hint in the instruction set to elegantly encode loop-task execution and an LTA microarchitectural template that can be configured at design time for different amounts of spatial/temporal decoupling to efficiently execute both regular and irregular loop tasks. Compared to an in-order CMP baseline, CMP+LTA results in an average speedup of 4.2× (1.8× area normalized) and similar energy efficiency. Compared to an out-of-order CMP baseline, CMP+LTA results in an average speedup of 2.3 (1.5× area normalized) and also improves energy efficiency by 3.2××. Our work suggests augmenting CMPs with lightweight LTAs can improve performance and efficiency on both regular and irregular loop-task parallel programs with minimal software changes.
| Original language | English |
|---|---|
| Title of host publication | MICRO 2017 - 50th Annual IEEE/ACM International Symposium on Microarchitecture Proceedings |
| Publisher | IEEE Computer Society |
| Pages | 759-773 |
| Number of pages | 15 |
| ISBN (Electronic) | 9781450349529 |
| DOIs | |
| State | Published - 14 Oct 2017 |
| Externally published | Yes |
| Event | 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2017 - Cambridge, United States Duration: 14 Oct 2017 → 18 Oct 2017 |
Publication series
| Name | Proceedings of the Annual International Symposium on Microarchitecture, MICRO |
|---|---|
| Volume | Part F131207 |
| ISSN (Print) | 1072-4451 |
Conference
| Conference | 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2017 |
|---|---|
| Country/Territory | United States |
| City | Cambridge |
| Period | 14/10/17 → 18/10/17 |
Bibliographical note
Publisher Copyright:© 2017 Association for Computing Machinery.
Keywords
- Programmable accelerators
- Task-parallel programming frameworks
- Work-stealing run-times
ASJC Scopus subject areas
- Hardware and Architecture