Skip to main navigation Skip to search Skip to main content

Synthetic data-driven explainable machine learning for groundwater salinity prediction in the Al-Qatif coastal aquifer of Saudi Arabia

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Study regionThe Al-Qatif coastal aquifer, Saudi Arabia.Study focusGroundwater salinity is a concern in coastal aquifers, where limited data restrict predictive modeling. Machine learning (ML) models have shown promise for salinity assessment; however, their performance is constrained by small sample sizes. To address this limitation, groundwater salinity was modeled using an integrated ML framework. A dataset of thirty-nine groundwater samples was augmented with 700 synthetic samples. Salinity (mg/L) was used as the target variable, and the synthetic data quality was evaluated using Jensen–Shannon Divergence (JSD), Maximum Mean Discrepancy (MMD), and Charge Balance Error (CBE). Four ML models were trained on both datasets.New hydrologic insightsThe results showed that Gaussian Mixture Models (GMMs) can preserve both the statistical structure and hydrochemical behavior of groundwater salinity data under data scarcity. The generated synthetic samples exhibited low marginal divergence from observed data with average JSD of 0.0797 and no detectable difference in joint multivariate structure as shown by MMD value of 0.0. The CBE confirmed that ionic balance characteristics were maintained rather than artificially enforced. Among the tested models, the Gradient Boosting Machine (GBM) demonstrated the most consistent generalization to real-only test data (Relative Root Mean Squared Error: rRMSE = 7.87 % and Relative Mean Absolute Error: rMAE = 6.73 %). Explainable analysis identified bromide, sodium, and chloride as the dominant factors controlling groundwater salinity in the study area.

Original languageEnglish
Article number103258
JournalJournal of Hydrology: Regional Studies
Volume64
DOIs
StatePublished - Apr 2026

Bibliographical note

Publisher Copyright:
© 2026 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license. http://creativecommons.org/licenses/by/4.0/

Keywords

  • Explainable artificial intelligence
  • Gaussian Mixture Models
  • Groundwater salinity
  • Machine learning
  • Synthetic data

ASJC Scopus subject areas

  • Water Science and Technology
  • Earth and Planetary Sciences (miscellaneous)

Fingerprint

Dive into the research topics of 'Synthetic data-driven explainable machine learning for groundwater salinity prediction in the Al-Qatif coastal aquifer of Saudi Arabia'. Together they form a unique fingerprint.

Cite this