Exploration of automatic optimisation for CUDA programming

Mayez Al-Mouhamed, Ayaz Ul Hassan Khan*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Writing optimised compute unified device architecture (CUDA) program for graphic processing units (GPUs) is complex even for experts. We present a design methodology for a restructuring tool that converts C-loops into optimised CUDA kernels based on a three-step algorithm which are loop tiling, coalesced memory access and resource optimisation. A method for finding possible loop tiling solutions with coalesced memory access is developed and a simplified algorithm for restructuring C-loops into an efficient CUDA kernel is presented. In the evaluation, we implement matrix multiply (MM), matrix transpose (M-transpose), matrix scaling (M-scaling) and matrix vector multiply (MV) using the proposed algorithm. We present the analysis of the execution time and GPU throughput for the above applications, which favourably compare to other proposals. Evaluation is carried out while scaling the problem size and running under a variety of kernel configurations. The obtained speedup is about 28-35% for M-transpose compared to NVIDIA Software Development Kit, 33% speedup for MV compared to general purpose computation on graphics processing unit compiler, and more than 80% speedup for MM and M-scaling compared to CUDA-lite.

Original languageEnglish
Pages (from-to)309-324
Number of pages16
JournalInternational Journal of Parallel, Emergent and Distributed Systems
Volume30
Issue number4
DOIs
StatePublished - 4 Jul 2015

Bibliographical note

Publisher Copyright:
© 2014 Taylor & Francis.

Keywords

  • CUDA
  • GPGPU
  • GPU
  • compiler transformations
  • directivebased language
  • parallel programming
  • source-to-source compiler

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Exploration of automatic optimisation for CUDA programming'. Together they form a unique fingerprint.

Cite this