Data augmentation using SMOTE technique: Application for prediction of burst pressure of hydrocarbons pipeline using supervised machine learning models

  • Afzal Ahmed Soomro*
  • , Ainul Akmar Mokhtar
  • , Masdi B. Muhammad
  • , Mohamad Hanif Md Saad
  • , Najeebullah Lashari
  • , Muhammad Hussain
  • , Abdul Sattar Palli
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

Accurate burst pressure prediction is critical for ensuring oil and gas pipeline safety, guiding maintenance decisions, and lowering costs and risks. Traditional methods have limitations, including high experimental costs, conservative empirical models, and computationally expensive numerical algorithms. Machine learning (ML) models have supplanted traditional methods in recent years. However, small and imbalanced datasets are the big challenge to build a ML model that can generate more accurate results. Moreover, the lack of generalization in ML models trained on a dataset of pipelines with specific material grids prevents them from producing superior results on other pipeline types. First, FEA was used to make a dataset. Then, a new way to improve machine learning (ML) model generalization for burst pressure prediction is suggested: combine publicly available datasets of different pipeline specifications. In this combined dataset, some pipelines have a higher number of data samples, and some have fewer, which causes a class imbalance issue. The Synthetic Minority Oversampling Technique (SMOTE) technique was applied to address the issue of class imbalance. The performance of various ML models, Extra Trees (ET), Extreme Gradient Boosting (XGBR), Random Forest (RF), Light Gradient Boosting Machine (LGBM), and Decision Tree (DT), was evaluated to validate the model's prediction and generalization on pipelines of various material grids. Results show that all the selected ML models produced high R-squared, i.e., >0.95, on balanced data compared to the imbalance dataset. These results show that SMOTE-based augmentation is a beneficial way to fix dataset imbalance and make ML models better at predicting burst pressure in oil and gas pipelines.

Original languageEnglish
Article number103233
JournalResults in Engineering
Volume24
DOIs
StatePublished - Dec 2024
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2024 The Author(s)

Keywords

  • Burst pressure prediction
  • Data augmentation
  • Machine learning
  • Oil and gas pipelines
  • SMOTE
  • Safety

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Data augmentation using SMOTE technique: Application for prediction of burst pressure of hydrocarbons pipeline using supervised machine learning models'. Together they form a unique fingerprint.

Cite this