ROC-tree: A novel decision tree induction algorithm based on receiver operating characteristics to classify gene expression data

M. Maruf Hossain, Md Rafiul Hassan, James Bailey

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

17 Scopus citations

Abstract

Gene expression information from microarray experiments is a primary form of data for biological analysis and can offer insights into disease processes and cellular behaviour. Such datasets are particularly challenging to build classifiers for, due to their very high dimensional nature and small sample size. Decision trees are a seemingly attractive technique for this domain, due to their easily interpretable white box nature and noise resistance. However, existing decision tree methods tend to perform rather poorly for classifying gene expression data. To address this gap, we introduce a new technique for building decision trees that is better suited to this scenario. Our method is based on consideration of the area under the Receiver Operating Characteristics (ROC) curve, to help determine decision tree characteristics, such as node selection and stopping criteria. We experimentally compare our algorithm, called ROC-tree, against other well known decision tree techniques, on a number of gene expression datasets. The experimental results clearly demonstrate that ROC-tree can deliver better classification accuracy in a range of challenging situations.

Original languageEnglish
Title of host publicationSociety for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130
PublisherSociety for Industrial and Applied Mathematics Publications
Pages455-465
Number of pages11
ISBN (Print)9781605603179
DOIs
StatePublished - 2008
Externally publishedYes
Event8th SIAM International Conference on Data Mining 2008, Applied Mathematics 130 - Atlanta, GA, United States
Duration: 24 Apr 200826 Apr 2008

Publication series

NameSociety for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130
Volume2

Conference

Conference8th SIAM International Conference on Data Mining 2008, Applied Mathematics 130
Country/TerritoryUnited States
CityAtlanta, GA
Period24/04/0826/04/08

Keywords

  • Classification
  • Decision tree
  • DNA microarray
  • Gene expression
  • High dimensional dataset
  • Receiver Operating Characteristics (ROC)

ASJC Scopus subject areas

  • Information Systems
  • Software
  • Signal Processing
  • Theoretical Computer Science

Fingerprint

Dive into the research topics of 'ROC-tree: A novel decision tree induction algorithm based on receiver operating characteristics to classify gene expression data'. Together they form a unique fingerprint.

Cite this