Statistical Analysis of Clustering Performances of NMF, Spectral Clustering, and K-means: With Gene Selection

  • Andri Mirzal*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

The using of statistical test to determine significances of performance differences between clustering algorithms is not yet common even until recently. This is an important task because the test can determine whether one algorithm is statistically better than the other one. Moreover, using statistical test to determine significances of performance gains/losses after applying some processing steps to datasets such as feature selection is even much less common. The first task has been addressed in our other work [1], and the second task is the topic of this paper. In this study, nonnegative matrix factorization (NMF), spectral clustering, and k-means are utilized as clustering methods; LS (Laplacian Score), SPEC (SPECtral), and SPFS (Similarity Preserving Feature Selection) are utilized as feature selection mechanisms; and eleven microarray gene expression datasets are used to evaluate performances of the clustering methods. The experimental results show that in average only LS can significantly improve performances of the clustering methods statistically, SPEC seems to offer no advantage, and SPFS instead lowers clustering performances. As it is expensive to apply selection mechanisms, these results raise a question whether it is worth to utilize them for selecting genes in microarray datasets.

Original languageEnglish
Title of host publication2020 2nd International Conference on Computer and Information Sciences, ICCIS 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728154671
DOIs
StatePublished - 13 Oct 2020

Publication series

Name2020 2nd International Conference on Computer and Information Sciences, ICCIS 2020

Bibliographical note

Publisher Copyright:
© 2020 IEEE.

Keywords

  • Clustering
  • K-means
  • Microarray data
  • NMF
  • Spectral method
  • Statistical analysis

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Information Systems and Management
  • Control and Optimization
  • Information Systems

Fingerprint

Dive into the research topics of 'Statistical Analysis of Clustering Performances of NMF, Spectral Clustering, and K-means: With Gene Selection'. Together they form a unique fingerprint.

Cite this