Abstract
Commit messages are vital for traceability, maintenance, and onboarding in modern software projects, yet their quality is frequently inconsistent. Recent large language models (LLMs) can transform code diffs into natural language summaries, offering a path to more consistent and informative commit messages. This paper makes two contributions: (i) it provides a systematic survey of automated commit message generation with LLMs, critically comparing prompt-only, fine-tuned, and retrieval-augmented approaches; and (ii) it specifies a transparent, agent-based evaluation blueprint centered on CommitBench. Unlike prior reviews, we include a detailed dataset audit, preprocessing impacts, evaluation metrics, and error taxonomy. The protocol defines dataset usage and splits, prompting and context settings, scoring and selection rules, and reporting guidelines (results by project, language, and commit type), along with an error taxonomy to guide qualitative analysis. Importantly, this work emphasizes methodology and design rather than presenting new empirical benchmarking results. The blueprint is intended to support reproducibility and comparability in future studies.
| Original language | English |
|---|---|
| Article number | 427 |
| Journal | Computers |
| Volume | 14 |
| Issue number | 10 |
| DOIs | |
| State | Published - Oct 2025 |
Bibliographical note
Publisher Copyright:© 2025 by the authors.
Keywords
- CommitBench dataset
- automated documentation
- commit message generation
- large language models
- retrieval-augmented generation
- software engineering automation
- transformer architecture
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- Human-Computer Interaction
- Computer Networks and Communications