Decision Tree Ensembles for Automatic Identification of Lithology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Lithology types identification is one of the processes geoscientists rely on to understand the subsurface formations and better evaluate the quality of reservoirs and aquifers. However, direct lithological identification processes usually require more effort and time. Therefore, researchers developed several machine learning models based on well-logging data to avoid challenges associated with direct lithological identification and increase identification accuracy. Nevertheless, high uncertainty and low accuracy are commonly encountered issues due to the heterogeneous nature of lithology types. This work aims to employ decision tree ensemble techniques to predict the lithologies more accurately in time saving and cost-efficient manner, accounting for the uncertainty. This study investigated the real-world well logs dataset from the public Athabasca Oil Sands Database to identify and extract the relevant features. Then, we conducted a thorough training using grid search to optimize the hyperparameters of the ensemble decision tree models. This paper evaluated two ensemble techniques: random forest (RF) and extreme gradient boosting (XGB). We picked metrics such as accuracy, precision, and recall to assess the developed models' performance using 5-fold cross-validation. Finally, we performed a chi-squared test to test our hypothesis of the identical performance of the developed models. The XGB and RF models have 94% and 93% accuracy, respectively. Also, the extreme gradient boost model's weighted average recall and precision of 93% and 93% are only 5% and 4% higher than the RF model. In addition, the chi-squared test resulted in a p-value as low as 0.013, suggesting a low probability of difference in both models' performance. Classification of sand and coal formations is more straightforward than sandy shale and cemented sand. The dataset's low representation of sandy shale and cemented sand can be the reason behind their prediction errors. The developed models can classify the studied field lithologies with an overall accuracy of 94%. In addition, there is no statistically significant evidence of a difference in prediction performance between extreme gradient boost and random forest.

Original languageEnglish
Title of host publicationSociety of Petroleum Engineers - SPE Symposium
Subtitle of host publicationLeveraging Artificial Intelligence to Shape the Future of the Energy Industry, AIS 2023
PublisherSociety of Petroleum Engineers
ISBN (Electronic)9781613999882
DOIs
StatePublished - 2023
Event2023 SPE Symposium: Leveraging Artificial Intelligence to Shape the Future of the Energy Industry, AIS 2023 - Al Khobar, Saudi Arabia
Duration: 17 Jan 202318 Jan 2023

Publication series

NameSociety of Petroleum Engineers - SPE Symposium: Leveraging Artificial Intelligence to Shape the Future of the Energy Industry, AIS 2023

Conference

Conference2023 SPE Symposium: Leveraging Artificial Intelligence to Shape the Future of the Energy Industry, AIS 2023
Country/TerritorySaudi Arabia
CityAl Khobar
Period17/01/2318/01/23

Bibliographical note

Publisher Copyright:
Copyright © 2023, Society of Petroleum Engineers.

ASJC Scopus subject areas

  • Geotechnical Engineering and Engineering Geology
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Decision Tree Ensembles for Automatic Identification of Lithology'. Together they form a unique fingerprint.

Cite this