A New Approach to Improve the Topological Stability in Non-Linear Dimensionality Reduction

Mohammed Elhenawy, Mahmoud Masoud*, Sebastien Glaser, Andry Rakotonirainy

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Dimensionality reduction in the machine learning field mitigates the undesired properties of high-dimensional spaces to facilitate classification, compression, and visualization of high-dimensional data. During the last decade, researchers proposed many new non-linear techniques for dimensionality reduction based on the assumption that data laying on or near a complex low-dimensional manifold is embedded in the high-dimensional space. On the other side, new techniques for dimensionality reduction aim to identify and extracting the manifold from the high-dimensional space. Isomap is one of widely-used low-dimensional embedding methods, where geodesic distances on a weighted graph are incorporated with the classical scaling (metric multidimensional scaling). The Isomap chooses the k-nearest neighbors based on the distance only which causes bridges and topological instability. In this paper, we propose a new algorithm to find the nearest neighbors to optimize the number of short-circuit errors and thus improve the topological stability. We assume that any point on the manifold and its nearest neighbors form a vector subspace and the orthogonal to that subspace is orthogonal to all vectors spans the vector subspace. Therefore, in the proposed algorithm, the point on the manifold and its nearest neighbors (i.e. the selected number of nearest neighbors based on the required subspace's dimension) are used to find the bases of the subspace and the orthogonal to that subspace which belongs to the orthogonal complementary subspace. Then, candidate neighbors points will be added to the nearest neighbors based on the distance and the angle between each candidate point and the orthogonal to the subspace. The new algorithm is tested using low dimensional (3D) synthesized datasets and high dimensional real datasets. The superior performance of the new algorithm in choosing the nearest neighbors is confirmed through visually inspecting its capability find to the correct D$ representation of the synthesized datasets at different $k$. Moreover, In the case of the high dimensional real datasets, the new algorithm yields a lower residual variance than the standard Isomap. Finally, we investigated using the new algorithm to find the nearest neighbors for the Locally Linear Embedding (LLE) algorithm.

Original languageEnglish
Article number8999521
Pages (from-to)33898-33908
Number of pages11
JournalIEEE Access
Volume8
DOIs
StatePublished - 2020
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2013 IEEE.

Keywords

  • Machine learning
  • dimension reduction
  • non-linear
  • topological stability
  • visualization

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'A New Approach to Improve the Topological Stability in Non-Linear Dimensionality Reduction'. Together they form a unique fingerprint.

Cite this