Prompting large language models for user simulation in task-oriented dialogue systems

Atheer Algherairy*, Moataz Ahmed

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Large Language Models (LLMs) have gained widespread popularity due to their instruction-following abilities. In this study, we evaluate their ability in simulating user interactions for task-oriented dialogue (TOD) systems. Our findings demonstrate that prompting LLMs reveals their promising capabilities for training and testing dialogue policies, eliminating the need for domain expertise in crafting complex rules or relying on large annotated data, as required by traditional simulators. The results show that the dialogue system trained with the ChatGPT simulator achieves a success rate of 59%, comparable to a 62% success rate of the dialogue system trained with the manual-rules, agenda-based user simulator (ABUS). Furthermore, the dialogue system trained with the ChatGPT simulator demonstrates better generalization ability compared to the dialogue system trained with the ABUS. Its success rate outperforms that of the dialogue system trained with the ABUS by 4% on GenTUS, 5% on the ChatGPT Simulator, and 3% on the Llama simulator. Nevertheless, LLM-based user simulators provide challenging environment, lexically rich, diverse, and random responses. Llama simulator outperforms the human reference in all lexical diversity metrics with a margin of 0.66 in SE, 0.39 in CE, 0.01 in MSTTR, 0.04 in HDD, and 0.55 in MTLD, while the ChatGPT simulator achieves comparable results. This ultimately contributes to enhancing the system's ability to generalize more effectively.

Original languageEnglish
Article number101697
JournalComputer Speech and Language
Volume89
DOIs
StatePublished - Jan 2025

Bibliographical note

Publisher Copyright:
© 2024 Elsevier Ltd

Keywords

  • Agenda-based simulator
  • Large language models
  • Prompting
  • Task Oriented Dialogue
  • User simulator

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Prompting large language models for user simulation in task-oriented dialogue systems'. Together they form a unique fingerprint.

Cite this