Abstract
Industrial wastewater contaminated with proteins and phosphates poses a significant challenge for producing clean water. This study innovatively employed regression-based machine learning (ML) algorithms to predict the separation performance of proteins with varying molecular weights from synthetic phosphate-laden wastewater using commercially available membranes with different pore sizes. The chosen ML tools are bi-layered neural network (BNN), linear regression (LR), least squares support vector machine (LSSVM), and Gaussian process regression (GPR). Correlation was employed to select the most pertinent variables for constructing an effective model combination while safeguarding against data leakage within the frugal dataset. Among the ML tools, the BNN and GPR algorithms demonstrated effective predictive capabilities for protein rejection. The collaborative integration of all input variable combinations resulted in superior predictive accuracy (R²=0.99) for protein rejection, showcasing minimal error rates for both the BNN and GPR algorithms. Interpretable SHapley Additive exPlanations (SHAP) analysis indicated that the molecular weight cutoff (MWCO), protein molecular weight (PMw), and isoelectric point (IEP) were the most influential factors affecting protein separation performance, with mean SHAP values of approximately 25, 12, and 15, respectively. The ML tools revealed that the input variables of MWCO, PMw, and IEP exerted a more substantial impact compared to hydro-dynamic variables. This study provides insights into advancing the development of ML tools tailored to sparse datasets, particularly for accurately predicting protein separation from phosphate-laden wastewater.
| Original language | English |
|---|---|
| Article number | 100187 |
| Journal | Cleaner Chemical Engineering |
| Volume | 11 |
| DOIs | |
| State | Published - Dec 2025 |
Bibliographical note
Publisher Copyright:© 2025 The Authors.
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 6 Clean Water and Sanitation
Keywords
- Bi-layered neural network (BNN)
- Gaussian process regression (GPR)
- Machine learning (ML)
- Protein separation
- SHapley additive exPlanations (SHAP)
- Ultrafiltration
ASJC Scopus subject areas
- Environmental Science (miscellaneous)
- Management, Monitoring, Policy and Law
Fingerprint
Dive into the research topics of 'Interpretable SHAP-based machine learning-assisted design for selecting ultrafiltration membranes in protein-laden phosphate wastewater'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver