Prediction enhancement for surface water sodium adsorption ratio using limited inputs: Implementation of hybridized stacked ensemble model with feature selection algorithm

Meysam Salarijazi*, Iman Ahmadianfar, Zaher Mundher Yaseen

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The Sodium Adsorption Ratio (SAR) is a widely used variable in water quality research, particularly in agriculture and environmental studies. In many cases, the key variables required for SAR calculation, namely Na+, Mg+2, and Ca+2, are not available. Consequently, the potential to calculate SAR using a limited number of water quality variables becomes critically important. The study implemented the Multilayer Perceptron Neural Network (MLPNN), Support Vector Regression (SVR), and K-Nearest Neighbors (KNN) models at level-0 for prediction purposes, along with the Boruta model for variable selection. A stacked ensemble learning model at level-1 enhanced the prediction accuracy. The discharge and water quality dataset from the Zarrin-Gol River in northern Iran was utilized to implement the modeling procedure. Results obtained from the variable selection process using the Boruta model revealed that using a limited number of water quality variables can effectively predict SAR even without the principal variables. Further investigation of the input combinations for the level-0 models demonstrated that, for the MLPNN, KNN, and SVR models, 4, 3, and 1 input variables, respectively, yielded optimal predictions. Among the level-0 models, the MLPNN model exhibited the highest accuracy, with RMSE = 0.54, MBE = 0.26, MAE = 0.44, R = 0.84, IA = 0.67, and KGE = 0.79. Implementing the stacked ensemble learning model at level-1 significantly improved the SAR prediction compared to the level-0 models. The ensemble-NN model yielded the best performance in estimating SAR within the range of recorded data, with RMSE = 0.53, MBE = 0.29, MAE = 0.41, R = 0.87, IA = 0.70, and KGE = 0.82. Residual analysis further confirmed the superior predictive capability of the level-1 models compared to the level-0 models. The generalized-logistic probability distribution function is used to estimate the extreme values data. The Ensemble-KNN model best predicted extreme values data, with RMSE = 0.69, MBE = −0.61, MAE = 0.61, R = 0.61, IA = 0.26, and KGE = 0.37. The findings underscore the substantial advancements achieved through stacked ensemble methods in enhancing the modeling of SAR across various aspects, including total data, extreme values, and models' residuals.

Original languageEnglish
Article number103561
JournalPhysics and Chemistry of the Earth
Volume134
DOIs
StatePublished - Jun 2024

Bibliographical note

Publisher Copyright:
© 2024 Elsevier Ltd

Keywords

  • Data-driven
  • Extreme value analysis
  • Residual analysis
  • Water quality

ASJC Scopus subject areas

  • Geophysics
  • Geochemistry and Petrology

Fingerprint

Dive into the research topics of 'Prediction enhancement for surface water sodium adsorption ratio using limited inputs: Implementation of hybridized stacked ensemble model with feature selection algorithm'. Together they form a unique fingerprint.

Cite this