ck-NN: A Clustered k-Nearest Neighbours Approach for Large-Scale Classification

Rafi Ullah, Ayaz H. Khan, S. M. Emaduddin

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

k-Nearest Neighbor (k-NN) is a non-parametric algorithm widely used for the estimation and classification of data points especially when the dataset is distributed in several classes. It is considered to be a lazy machine learning algorithm as most of the computations are done during the testing phase instead of performing this task during the training of data. Hence it is practically inefficient, infeasible and inapplicable while processing huge datasets i.e. Big Data. On the other hand, clustering techniques (unsupervised learning) greatly affect results if you do normalization or standardization techniques, difficult to determine "k" Value. In this paper, some novel techniques are proposed to be used as pre-state mechanism of state-of-the-art k-NN Classification Algorithm. Our proposed mechanism uses unsupervised clustering algorithm on large dataset before applying k-NN algorithm on different clusters that might running on single machine, multiple machines or different nodes of a cluster in distributed environment. Initially dataset, possibly having multi dimensions, is pass through clustering technique (K-Means) at master node or controller to find the number of clusters equal to the number of nodes in distributed systems or number of cores in system, and then each cluster will be assigned to exactly one node or one core and then applies k-NN locally, each core or node in clusters sends their best result and the selector choose best and nearest possible class from all options. We will be using one of the gold standard distributed framework. We believe that our proposed mechanism could be applied on big data. We also believe that the architecture can also be implemented on multi GPUs or FPGA to take flavor of k-NN on large or huge datasets where traditional k-NN is very slow.

Original languageEnglish
Pages (from-to)67-77
Number of pages11
JournalAdvances in Distributed Computing and Artificial Intelligence Journal
Volume8
Issue number3
DOIs
StatePublished - 2019
Externally publishedYes

Bibliographical note

Publisher Copyright:
© University of Salamanca. All Rights Reserved.

Keywords

  • Artificial Intelligence
  • Distributed Computing
  • Machine Learning
  • k-Means Clustering
  • k-Nearest Neighbor Classification

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'ck-NN: A Clustered k-Nearest Neighbours Approach for Large-Scale Classification'. Together they form a unique fingerprint.

Cite this