Interpretable SHAP-based machine learning-assisted design for selecting ultrafiltration membranes in protein-laden phosphate wastewater

  • Lukka Thuyavan Yogarathinam
  • , Sani I. Abba*
  • , Jamilu Usman
  • , Muthumareeswaran Ramamoorthy
  • , Isam H. Aljundi
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Industrial wastewater contaminated with proteins and phosphates poses a significant challenge for producing clean water. This study innovatively employed regression-based machine learning (ML) algorithms to predict the separation performance of proteins with varying molecular weights from synthetic phosphate-laden wastewater using commercially available membranes with different pore sizes. The chosen ML tools are bi-layered neural network (BNN), linear regression (LR), least squares support vector machine (LSSVM), and Gaussian process regression (GPR). Correlation was employed to select the most pertinent variables for constructing an effective model combination while safeguarding against data leakage within the frugal dataset. Among the ML tools, the BNN and GPR algorithms demonstrated effective predictive capabilities for protein rejection. The collaborative integration of all input variable combinations resulted in superior predictive accuracy (R²=0.99) for protein rejection, showcasing minimal error rates for both the BNN and GPR algorithms. Interpretable SHapley Additive exPlanations (SHAP) analysis indicated that the molecular weight cutoff (MWCO), protein molecular weight (PMw), and isoelectric point (IEP) were the most influential factors affecting protein separation performance, with mean SHAP values of approximately 25, 12, and 15, respectively. The ML tools revealed that the input variables of MWCO, PMw, and IEP exerted a more substantial impact compared to hydro-dynamic variables. This study provides insights into advancing the development of ML tools tailored to sparse datasets, particularly for accurately predicting protein separation from phosphate-laden wastewater.

Original languageEnglish
Article number100187
JournalCleaner Chemical Engineering
Volume11
DOIs
StatePublished - Dec 2025

Bibliographical note

Publisher Copyright:
© 2025 The Authors.

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 6 - Clean Water and Sanitation
    SDG 6 Clean Water and Sanitation

Keywords

  • Bi-layered neural network (BNN)
  • Gaussian process regression (GPR)
  • Machine learning (ML)
  • Protein separation
  • SHapley additive exPlanations (SHAP)
  • Ultrafiltration

ASJC Scopus subject areas

  • Environmental Science (miscellaneous)
  • Management, Monitoring, Policy and Law

Fingerprint

Dive into the research topics of 'Interpretable SHAP-based machine learning-assisted design for selecting ultrafiltration membranes in protein-laden phosphate wastewater'. Together they form a unique fingerprint.

Cite this