Improving k-nearest neighbour classification with distance functions based on receiver operating characteristics

Md Rafiul Hassan, M. Maruf Hossain, James Bailey, Kotagiri Ramamohanarao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

19 Scopus citations

Abstract

The k-nearest neighbour (k-NN) technique, due to its interpretable nature, is a simple and very intuitively appealing method to address classification problems. However, choosing an appropriate distance function for k-NN can be challenging and an inferior choice can make the classifier highly vulnerable to noise in the data. In this paper, we propose a new method for determining a good distance function for k-NN. Our method is based on consideration of the area under the Receiver Operating Characteristics (ROC) curve, which is a well known method to measure the quality of binary classifiers. It computes weights for the distance function, based on ROC properties within an appropriate neighbourhood for the instances whose distance is being computed. We experimentally compare the effect of our scheme with a number of other well-known k-NN distance metrics, as well as with a range of different classifiers. Experiments show that our method can substantially boost the classification performance of the k-NN algorithm. Furthermore, in a number of cases our technique is even able to deliver better accuracy than state-of-the-art non k-NN classifiers, such as support vector machines.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2008, Proceedings
Pages489-504
Number of pages16
EditionPART 1
DOIs
StatePublished - 2008
Externally publishedYes
EventEuropean Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2008 - Antwerp, Belgium
Duration: 15 Sep 200819 Sep 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume5211 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2008
Country/TerritoryBelgium
CityAntwerp
Period15/09/0819/09/08

Keywords

  • Classification
  • Feature weighting
  • Gene expression
  • K-Nearest neighbour
  • Receiver Operating Characteristics (ROC)

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Improving k-nearest neighbour classification with distance functions based on receiver operating characteristics'. Together they form a unique fingerprint.

Cite this