Arabic machine-generated text detection: Stylometric analysis and cross-model evaluation

  • Maged S. Al-Shaibani
  • , Moataz Ahmed*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Large Language Models (LLMs) have achieved unprecedented capabilities in generating human-like text, posing subtle yet significant challenges for information integrity across critical domains. This challenge becomes severe, particularly in under-explored and low-resource languages like Arabic. This paper presents a comprehensive investigation of Arabic machine-generated text detection, examining outputs from diverse LLMs across academic and social media domains using multiple generation methods. Through systematic stylometric analysis, we identify distinctive linguistic patterns that differentiate human-written from machine-generated Arabic text, revealing domain-specific characteristics that vary notably across contexts. These insights inform our development of BERT-based detection models, which achieve exceptional performance in formal contexts (up to 99 % F1-score). However, cross-domain analysis reveals generalization challenges, consistent with prior literature. Robustness evaluation demonstrates that detectors maintain notable performance even with more than 90 % random text dropping. This work provides a comprehensive investigation of Arabic machine-generated text to date, uniquely integrating multiple generation methods, diverse LLMs, and in-depth stylometric analysis across varied domains to establish a foundation for robust, linguistically-informed detection systems essential for preserving information integrity in Arabic-language contexts.

Original languageEnglish
Article number130644
JournalExpert Systems with Applications
Volume305
DOIs
StatePublished - 5 Apr 2026

Bibliographical note

Publisher Copyright:
© 2025 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.

Keywords

  • Arabic natural language processing
  • Cross-domain generalization
  • Large language models
  • Machine-generated text detection
  • Stylometric analysis
  • Text classification

ASJC Scopus subject areas

  • General Engineering
  • Computer Science Applications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Arabic machine-generated text detection: Stylometric analysis and cross-model evaluation'. Together they form a unique fingerprint.

Cite this