Using Intra-core loop-task accelerators to improve the productivity and performance of task-based parallel programs

  • Ji Kim
  • , Shunning Jiang
  • , Christopher Torng
  • , Moyang Wang
  • , Shreesha Srinath
  • , Berkin Ilbeyi
  • , Khalid Al-Hawaj
  • , Christopher Batten

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

Task-based parallel programming frameworks offer compelling productivity and performance benefits for modern chip multi-processors (CMPs). At the same time, CMPs also provide packed-SIMD units to exploit fine-grain data parallelism. Two fundamental challenges make using packed-SIMD units with task-parallel programs particularly difficult: (1) the intra-core parallel abstraction gap; and (2) inefficient execution of irregular tasks. To address these challenges, we propose augmenting CMPs with intra-core loop-task accelerators (LTAs). We introduce a lightweight hint in the instruction set to elegantly encode loop-task execution and an LTA microarchitectural template that can be configured at design time for different amounts of spatial/temporal decoupling to efficiently execute both regular and irregular loop tasks. Compared to an in-order CMP baseline, CMP+LTA results in an average speedup of 4.2× (1.8× area normalized) and similar energy efficiency. Compared to an out-of-order CMP baseline, CMP+LTA results in an average speedup of 2.3 (1.5× area normalized) and also improves energy efficiency by 3.2××. Our work suggests augmenting CMPs with lightweight LTAs can improve performance and efficiency on both regular and irregular loop-task parallel programs with minimal software changes.

Original languageEnglish
Title of host publicationMICRO 2017 - 50th Annual IEEE/ACM International Symposium on Microarchitecture Proceedings
PublisherIEEE Computer Society
Pages759-773
Number of pages15
ISBN (Electronic)9781450349529
DOIs
StatePublished - 14 Oct 2017
Externally publishedYes
Event50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2017 - Cambridge, United States
Duration: 14 Oct 201718 Oct 2017

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
VolumePart F131207
ISSN (Print)1072-4451

Conference

Conference50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2017
Country/TerritoryUnited States
CityCambridge
Period14/10/1718/10/17

Bibliographical note

Publisher Copyright:
© 2017 Association for Computing Machinery.

Keywords

  • Programmable accelerators
  • Task-parallel programming frameworks
  • Work-stealing run-times

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Using Intra-core loop-task accelerators to improve the productivity and performance of task-based parallel programs'. Together they form a unique fingerprint.

Cite this