Abstract
The using of statistical test to determine significances of performance differences between clustering algorithms is not yet common even until recently. This is an important task because the test can determine whether one algorithm is statistically better than the other one. Moreover, using statistical test to determine significances of performance gains/losses after applying some processing steps to datasets such as feature selection is even much less common. The first task has been addressed in our other work [1], and the second task is the topic of this paper. In this study, nonnegative matrix factorization (NMF), spectral clustering, and k-means are utilized as clustering methods; LS (Laplacian Score), SPEC (SPECtral), and SPFS (Similarity Preserving Feature Selection) are utilized as feature selection mechanisms; and eleven microarray gene expression datasets are used to evaluate performances of the clustering methods. The experimental results show that in average only LS can significantly improve performances of the clustering methods statistically, SPEC seems to offer no advantage, and SPFS instead lowers clustering performances. As it is expensive to apply selection mechanisms, these results raise a question whether it is worth to utilize them for selecting genes in microarray datasets.
| Original language | English |
|---|---|
| Title of host publication | 2020 2nd International Conference on Computer and Information Sciences, ICCIS 2020 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| ISBN (Electronic) | 9781728154671 |
| DOIs | |
| State | Published - 13 Oct 2020 |
Publication series
| Name | 2020 2nd International Conference on Computer and Information Sciences, ICCIS 2020 |
|---|
Bibliographical note
Publisher Copyright:© 2020 IEEE.
Keywords
- Clustering
- K-means
- Microarray data
- NMF
- Spectral method
- Statistical analysis
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Networks and Communications
- Information Systems and Management
- Control and Optimization
- Information Systems