Data-Driven Approach for Predicting Gasoline Yield in an FCC Unit Charged with Light and Heavy Feedstocks: A PCA-Guided Grouping for Enhanced Modeling Experience

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

This work introduces a feed-aware modeling framework that integrates ensemble anomaly detection, PCA-based segmentation integrated with feed chemistry, and Bayesian-optimized regression to improve the model performance of a Fluid Catalytic Cracking (FCC) unit using pilot-plant experimental data. Three regression models were trained using 5-fold cross-validation and optimized using Bayesian Optimization: Gaussian Process Regression (GPR), Boosted Trees (BT), and Robust Linear Regression (RLR). Modeling on the full unsegmented data set showed that conversion was a highly predictable target. Both GPR and BT achieved strong performance (mean R2 values of approximately 0.72 to 0.73 after optimization), while RLR remained unstable due to its linear assumptions. For the gasoline yield, BT with optimization performed best (mean R2 of approximately 0.67), while GPR and RLR with optimization showed limited predictive skill (R2 values of approximately 0.25 and −0.51, respectively). These results reflect the nonlinear and distributed nature of yield formation. PCA revealed a clear latent structure that aligned strongly with feedstock type, and K-means clustering confirmed this separation by distinguishing naphtha from VGO samples in a chemically meaningful way. Thus, the data set was segmented into feedstock groups, which yielded substantial improvements, especially for GPR. In the naphtha subset, GPR with optimization achieved mean R2 values of 0.956 for conversion and 0.902 for gasoline yield, with very low associated error values. Performance also improved for VGO, although to a lesser extent, while RLR continued to underperform. Overall, segmentation based on PCA and clustering was essential for improving model reliability in FCC prediction tasks. Among all algorithms evaluated, GPR consistently delivered the highest accuracy and generalizability across feed types and targets, supporting its suitability for data-driven optimization of catalytic cracking performance.

Original languageEnglish
Pages (from-to)271-298
Number of pages28
JournalJournal of Chemical Information and Modeling
Volume66
Issue number1
DOIs
StatePublished - 12 Jan 2026

Bibliographical note

Publisher Copyright:
© 2025 American Chemical Society

ASJC Scopus subject areas

  • General Chemistry
  • General Chemical Engineering
  • Computer Science Applications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Data-Driven Approach for Predicting Gasoline Yield in an FCC Unit Charged with Light and Heavy Feedstocks: A PCA-Guided Grouping for Enhanced Modeling Experience'. Together they form a unique fingerprint.

Cite this