The Effect of Points Dispersion on the k-nn Search in Random Projection Forests

Mashaan Alshammari*, John Stavrakakis, Adel F. Ahmed, Masahiro Takatsuka

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Partitioning trees are efficient data structures for k -nearest neighbor search. Machine learning libraries commonly use a special type of partitioning trees called d-trees to perform k -nn search. Unfortunately, d-trees can be ineffective in high dimensions because they need more tree levels to decrease the vector quantization (VQ) error. Random projection trees rpTrees solve this scalability problem by using random directions to split the data. A collection of rpTrees is called rpForest. k -nn search in an rpForest is influenced by two factors: 1) the dispersion of points along the random direction and 2) the number of rpTrees in the rpForest. In this study, we investigate how these two factors affect the k -nn search with varying k values and different datasets. We found that with larger number of trees, the dispersion of points has a very limited effect on the k -nn search. One should use the original rpTree algorithm by picking a random direction regardless of the dispersion of points.

Original languageEnglish
Pages (from-to)80858-80868
Number of pages11
JournalIEEE Access
Volume10
DOIs
StatePublished - 2022

Bibliographical note

Publisher Copyright:
© 2013 IEEE.

Keywords

  • k-nearest neighbor search
  • random projection forests
  • random projection trees
  • unsupervised learning

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'The Effect of Points Dispersion on the k-nn Search in Random Projection Forests'. Together they form a unique fingerprint.

Cite this