TY - GEN
T1 - Improving k-nearest neighbour classification with distance functions based on receiver operating characteristics
AU - Hassan, Md Rafiul
AU - Hossain, M. Maruf
AU - Bailey, James
AU - Ramamohanarao, Kotagiri
PY - 2008
Y1 - 2008
N2 - The k-nearest neighbour (k-NN) technique, due to its interpretable nature, is a simple and very intuitively appealing method to address classification problems. However, choosing an appropriate distance function for k-NN can be challenging and an inferior choice can make the classifier highly vulnerable to noise in the data. In this paper, we propose a new method for determining a good distance function for k-NN. Our method is based on consideration of the area under the Receiver Operating Characteristics (ROC) curve, which is a well known method to measure the quality of binary classifiers. It computes weights for the distance function, based on ROC properties within an appropriate neighbourhood for the instances whose distance is being computed. We experimentally compare the effect of our scheme with a number of other well-known k-NN distance metrics, as well as with a range of different classifiers. Experiments show that our method can substantially boost the classification performance of the k-NN algorithm. Furthermore, in a number of cases our technique is even able to deliver better accuracy than state-of-the-art non k-NN classifiers, such as support vector machines.
AB - The k-nearest neighbour (k-NN) technique, due to its interpretable nature, is a simple and very intuitively appealing method to address classification problems. However, choosing an appropriate distance function for k-NN can be challenging and an inferior choice can make the classifier highly vulnerable to noise in the data. In this paper, we propose a new method for determining a good distance function for k-NN. Our method is based on consideration of the area under the Receiver Operating Characteristics (ROC) curve, which is a well known method to measure the quality of binary classifiers. It computes weights for the distance function, based on ROC properties within an appropriate neighbourhood for the instances whose distance is being computed. We experimentally compare the effect of our scheme with a number of other well-known k-NN distance metrics, as well as with a range of different classifiers. Experiments show that our method can substantially boost the classification performance of the k-NN algorithm. Furthermore, in a number of cases our technique is even able to deliver better accuracy than state-of-the-art non k-NN classifiers, such as support vector machines.
KW - Classification
KW - Feature weighting
KW - Gene expression
KW - K-Nearest neighbour
KW - Receiver Operating Characteristics (ROC)
UR - http://www.scopus.com/inward/record.url?scp=56049105206&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-87479-9_50
DO - 10.1007/978-3-540-87479-9_50
M3 - Conference contribution
AN - SCOPUS:56049105206
SN - 354087478X
SN - 9783540874782
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 489
EP - 504
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2008, Proceedings
T2 - European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2008
Y2 - 15 September 2008 through 19 September 2008
ER -