Red teaming large language models: A comprehensive review and critical analysis

Research output: Contribution to journalArticlepeer-review

Abstract

Securing large language models (LLMs) remains a critical challenge as their adoption across various sectors rapidly grows. While advancements in LLM development have enhanced their capabilities, inherent vulnerabilities continue to pose significant risks, exposing these models to various forms of attack. This study provides a comprehensive review of LLMs’ red teaming, distinguished by its broad coverage and intuitive organization. It systematically explores a range of red teaming attacks, including prompt-based attacks, data manipulation attacks, model exploitation attacks, information extraction attacks, and model degradation attacks. Additionally, it provides a critical review and analysis of evaluation methods and benchmarks, focusing on datasets, evaluation metrics, and benchmarking techniques used in LLM red teaming and risk assessment. Our review reflects the current state of LLM security and provides new insights alongside established methods by integrating recent and impactful research. The structured presentation of our findings offers a comprehensive and actionable resource, facilitating a deeper understanding of the complexities involved. This review highlights the proactive assessment of risk and exploitation potential, and contributes to the development of more secure and responsible LLMs, serving as a valuable guide for researchers, practitioners, and policymakers.

Original languageEnglish
Article number104239
JournalInformation Processing and Management
Volume62
Issue number6
DOIs
StatePublished - Nov 2025

Bibliographical note

Publisher Copyright:
© 2025 Elsevier Ltd

Keywords

  • AI security
  • Large language models (LLMs)
  • Model exploitation
  • Red teaming attacks
  • Red teaming evaluation
  • Trustworthy AI

ASJC Scopus subject areas

  • Information Systems
  • Media Technology
  • Computer Science Applications
  • Management Science and Operations Research
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Red teaming large language models: A comprehensive review and critical analysis'. Together they form a unique fingerprint.

Cite this