Can large language models automate the HAZOP process without human intervention?

  • Junseo Lee
  • , Sunhwa Park*
  • , Sehyeon Oh
  • , Byungchol Ma
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

This study investigates whether state-of-the-art multimodal large language models (LLMs) can autonomously perform the entire Hazard and Operability Studies (HAZOP) process without human intervention. Four LLMs—GPT4o, GPT4o-mini, LLAMA, and Gemini—were utilized to generate automatic HAZOP worksheets spanning dozens of pages from an identical piping and instrumentation diagram (P&ID) using a standardized prompt. Their outputs were benchmarked against an expert-prepared refence worksheet and evaluated along two key aspects: (1) model performance, measured by similarity and computational cost and (2) HAZOP performance, measured by the validity of generated scenario and the diversity of safeguards. The results showed that all four LLMs achieved high similarity scores to the reference (F1 scores > 86 %). LLAMA was the most cost-efficient ($0.011 per worksheet), while Gemini generated the greatest number of scenarios (34.3 per worksheet) and safeguards (1.79 per deviation). This study presents a structured framework for evaluating LLMs in HAZOP and highlights their potential as assistive tools in the process safety field. However, key limitations were observed. The proportion of semantically valid scenarios remained low (0.19 to 0.37), and safeguards were heavily biased toward procedural measures, indicating limited diversity in risk-mitigation strategies. To enhance the reliability and practicability of LLM-based HAZOP studies, future research should focus on advanced prompt engineering, domain-specific fine-tuning, and improved reasoning capabilities.

Original languageEnglish
Article number107039
JournalSafety Science
Volume194
DOIs
StatePublished - Feb 2026

Bibliographical note

Publisher Copyright:
© 2025 Elsevier Ltd.

Keywords

  • HAZOP
  • Large language model
  • Natural language processing
  • Process hazard analysis
  • Process safety

ASJC Scopus subject areas

  • Safety, Risk, Reliability and Quality
  • Safety Research
  • Public Health, Environmental and Occupational Health

Fingerprint

Dive into the research topics of 'Can large language models automate the HAZOP process without human intervention?'. Together they form a unique fingerprint.

Cite this