Abstract
This work introduces a feed-aware modeling framework that integrates ensemble anomaly detection, PCA-based segmentation integrated with feed chemistry, and Bayesian-optimized regression to improve the model performance of a Fluid Catalytic Cracking (FCC) unit using pilot-plant experimental data. Three regression models were trained using 5-fold cross-validation and optimized using Bayesian Optimization: Gaussian Process Regression (GPR), Boosted Trees (BT), and Robust Linear Regression (RLR). Modeling on the full unsegmented data set showed that conversion was a highly predictable target. Both GPR and BT achieved strong performance (mean R2 values of approximately 0.72 to 0.73 after optimization), while RLR remained unstable due to its linear assumptions. For the gasoline yield, BT with optimization performed best (mean R2 of approximately 0.67), while GPR and RLR with optimization showed limited predictive skill (R2 values of approximately 0.25 and −0.51, respectively). These results reflect the nonlinear and distributed nature of yield formation. PCA revealed a clear latent structure that aligned strongly with feedstock type, and K-means clustering confirmed this separation by distinguishing naphtha from VGO samples in a chemically meaningful way. Thus, the data set was segmented into feedstock groups, which yielded substantial improvements, especially for GPR. In the naphtha subset, GPR with optimization achieved mean R2 values of 0.956 for conversion and 0.902 for gasoline yield, with very low associated error values. Performance also improved for VGO, although to a lesser extent, while RLR continued to underperform. Overall, segmentation based on PCA and clustering was essential for improving model reliability in FCC prediction tasks. Among all algorithms evaluated, GPR consistently delivered the highest accuracy and generalizability across feed types and targets, supporting its suitability for data-driven optimization of catalytic cracking performance.
| Original language | English |
|---|---|
| Pages (from-to) | 271-298 |
| Number of pages | 28 |
| Journal | Journal of Chemical Information and Modeling |
| Volume | 66 |
| Issue number | 1 |
| DOIs | |
| State | Published - 12 Jan 2026 |
Bibliographical note
Publisher Copyright:© 2025 American Chemical Society
ASJC Scopus subject areas
- General Chemistry
- General Chemical Engineering
- Computer Science Applications
- Library and Information Sciences