RT-CUDA: A Software Tool for CUDA Code Restructuring

Ayaz H. Khan*, Mayez Al-Mouhamed, Muhammed Al-Mulhem, Adel F. Ahmed

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Recent development in graphic processing units (GPUs) has opened a new challenge in harnessing their computing power as a new general purpose computing paradigm. However, porting applications to CUDA remains a challenge to average programmers, which have to package code in separate functions, explicitly manage data transfers between the host and device memories, and manually optimize GPU memory utilization. In this paper, we propose a restructuring tool (RT-CUDA) that takes a C-like program and some user directives as compiler hints to produce an optimized CUDA code. The tool strategy is based on efficient management of the memory system to minimize data motion by managing the transfer between host and device, maximizing bandwidth for device memory accesses, and enhancing data locality and re-use of cached data using shared-memory and registers. Enhanced resource utilization is implemented by re-writing code as parametric kernels and use of efficient auto-tuning. The tool enables calling numerical libraries (CuBLAS, CuSPARSE, etc.) to help implement applications in science simulation like iterative linear algebra solvers. For the above applications, the tool implement an inter-block global synchronization which allow the execution overall among a few iterations which is helpful to balance load and to avoid polling. Evaluation of RT-CUDA has been performed using a variety of basic linear algebra operators (Madd, MM, MV, VV, etc.) as well as the programming of iterative solvers for systems of linear equations like Jacobi and Conjugate Gradient algorithms. Significant speedup has been achieved over other compilers like PGI OpenACC and GPGPU compilers for the above applications. Evaluation shows that generated kernels efficiently call math libraries and enable implementing complete iterative solvers. The tool help scientists developing parallel simulators like reservoir simulators, molecular dynamics, etc. without exposing to complexity of GPU and CUDA programming. We have partnership with a group of researchers at the Saudi Aramco, a national company in Saudi Arabia. RT-CUDA is currently explored as a potential development tool for applications involving linear algebra solvers by the above group. In addition, RT-CUDA is being used by Senior and Graduate students at King Fahd University of Petroleum and Minerals in their projects as part of RT-CUDA continuous enhancement.

Original languageEnglish
Pages (from-to)551-594
Number of pages44
JournalInternational Journal of Parallel Programming
Volume45
Issue number3
DOIs
StatePublished - 1 Jun 2017

Bibliographical note

Publisher Copyright:
© 2016, Springer Science+Business Media New York.

Keywords

  • CUDA
  • GPGPU
  • Kernel optimizations
  • Massively parallel programming
  • nVidia Kepler

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Information Systems

Fingerprint

Dive into the research topics of 'RT-CUDA: A Software Tool for CUDA Code Restructuring'. Together they form a unique fingerprint.

Cite this