Abstract
Recently, Bayesian Optimization (BO) provides an efficient technique for selecting the hyperparameters of machine learning models. The BO strategy maintains a surrogate model and an acquisition function to efficiently optimize the computation-intensive functions with a few iterations. In this paper, we demonstrate the utility of the BO to fine-tune the hyperparameters of a Random Forest (RF) model for a problem related to the recognition of splice-junction genetic sequences. Locating these splice-junctions prompts further understanding of the DNA splicing process. Specifically, the BO algorithm optimizes four RF hyperparameters: number of trees, number of splitting features, splitting criterion, and leaf size. The optimized RF model automatically selects the most predictive features of the training data. The dataset is obtained from the UCI machine learning repository where half of the records represent two different types of splice-junctions and the other half does not represent any splice-junction. Experimental results proved the advantage of the BO-RF with 99.96% and 97.34% training and test classification accuracies respectively. The results also demonstrated the ability of the RF model to select the most important features, ensuring the best possible results using Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and decision tree (DT) models. Some practical procedures in model development and evaluation such as out-of-bag error and cross-validation approaches are also referred to.
Original language | English |
---|---|
Pages (from-to) | 30505-30522 |
Number of pages | 18 |
Journal | Multimedia Tools and Applications |
Volume | 80 |
Issue number | 20 |
DOIs | |
State | Published - Aug 2021 |
Bibliographical note
Publisher Copyright:© 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
Keywords
- Bayesian optimization, feature selection, support vector machine
- Decision tree
- K-nearest neighbor
- Random forest
- Splice junction recognition
ASJC Scopus subject areas
- Software
- Media Technology
- Hardware and Architecture
- Computer Networks and Communications