Abstract
Study regionThe Al-Qatif coastal aquifer, Saudi Arabia.Study focusGroundwater salinity is a concern in coastal aquifers, where limited data restrict predictive modeling. Machine learning (ML) models have shown promise for salinity assessment; however, their performance is constrained by small sample sizes. To address this limitation, groundwater salinity was modeled using an integrated ML framework. A dataset of thirty-nine groundwater samples was augmented with 700 synthetic samples. Salinity (mg/L) was used as the target variable, and the synthetic data quality was evaluated using Jensen–Shannon Divergence (JSD), Maximum Mean Discrepancy (MMD), and Charge Balance Error (CBE). Four ML models were trained on both datasets.New hydrologic insightsThe results showed that Gaussian Mixture Models (GMMs) can preserve both the statistical structure and hydrochemical behavior of groundwater salinity data under data scarcity. The generated synthetic samples exhibited low marginal divergence from observed data with average JSD of 0.0797 and no detectable difference in joint multivariate structure as shown by MMD value of 0.0. The CBE confirmed that ionic balance characteristics were maintained rather than artificially enforced. Among the tested models, the Gradient Boosting Machine (GBM) demonstrated the most consistent generalization to real-only test data (Relative Root Mean Squared Error: rRMSE = 7.87 % and Relative Mean Absolute Error: rMAE = 6.73 %). Explainable analysis identified bromide, sodium, and chloride as the dominant factors controlling groundwater salinity in the study area.
| Original language | English |
|---|---|
| Article number | 103258 |
| Journal | Journal of Hydrology: Regional Studies |
| Volume | 64 |
| DOIs | |
| State | Published - Apr 2026 |
Bibliographical note
Publisher Copyright:© 2026 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license. http://creativecommons.org/licenses/by/4.0/
Keywords
- Explainable artificial intelligence
- Gaussian Mixture Models
- Groundwater salinity
- Machine learning
- Synthetic data
ASJC Scopus subject areas
- Water Science and Technology
- Earth and Planetary Sciences (miscellaneous)
Fingerprint
Dive into the research topics of 'Synthetic data-driven explainable machine learning for groundwater salinity prediction in the Al-Qatif coastal aquifer of Saudi Arabia'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver