Skip to main navigation Skip to search Skip to main content

Optimizing strassen matrix multiply on GPUs

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Many core systems are basically designed for applications having large data parallelism. Strassen Matrix Multiply (MM) can be formulated as a depth first (DFS) traversal of a recursion tree where all cores work in parallel on computing each of the NxN sub-matrices that reduces storage at the detriment of large data motion to gather and aggregate the results. We propose Strassen and Winograd algorithms (S-MM and W-MM) based on three optimizations: a set of basic algebra functions to reduce overhead, invoking efficient library (CUBLAS 5.5), and parameter-tuning of parametric kernel to improve resource occupancy. On GPUs, W-MM and S-MM with one recursion level outperform CUBLAS 5.5 Library with up to twice as faster for large arrays satisfying N>=2048 and N>=3072, respectively. Compared to NVIDIA SDK library, S-MM and W-MM achieved a speedup between 20x to 80x for the above arrays. The proposed approach can be used to enhance the performance of CUBLAS and MKL libraries.

Original languageEnglish
Title of host publication2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2015 - Proceedings
EditorsKeizo Saisho
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781479986767
DOIs
StatePublished - 3 Aug 2015

Publication series

Name2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2015 - Proceedings

Bibliographical note

Publisher Copyright:
© 2015 IEEE.

Keywords

  • CUDA Programming
  • Fast Matrix Multiplication
  • Graphics Processing Unit (GPU)
  • Strassen

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Optimizing strassen matrix multiply on GPUs'. Together they form a unique fingerprint.

Cite this