Abstract
Filters are the fastest among the different types of feature selection methods. They employ metrics from information theory, such as mutual information (MI), Joint-MI (JMI), and minimal redundancy and maximal relevance (mRMR). The determination of the optimal feature selection set is an NP-hard problem. This work proposes the engineering of the Genetic Algorithm (GA) in which the fitness of solutions consists of two terms. The first is a feature selection metric such as MI, JMI, and mRMR, and the second term is the overlapping-coefficient that accounts for the diversity in the GA population. Experimental results show that the proposed algorithm can return multiple good quality solutions that also have minimal overlap with each other. Numerous solutions provide significant benefits when the test data contains none or missing values. Experiments were conducted using two publicly available time-series datasets. The feature sets are also applied to perform forecasting using a simple Long Short-Term Memory (LSTM) model, and the solution quality of the forecasting using different feature sets is analyzed. The proposed algorithm was compared with a popular optimization tool 'Basic Open-source Nonlinear Mixed INteger programming' (BONMIN), and a recent feature selection algorithm 'Conditional Mutual Information Considering Feature Interaction' (CMFSI). The experiments show that the multiple solutions found by the proposed method have good quality and minimal overlap.
| Original language | English |
|---|---|
| Article number | 8952613 |
| Pages (from-to) | 9597-9609 |
| Number of pages | 13 |
| Journal | IEEE Access |
| Volume | 8 |
| DOIs | |
| State | Published - 2020 |
Bibliographical note
Publisher Copyright:© 2013 IEEE.
Keywords
- Feature selection
- deep learning
- forecasting
- genetic algorithm
- machine learning
- optimization methods
ASJC Scopus subject areas
- General Computer Science
- General Materials Science
- General Engineering