Abstract
Large Language Models (LLMs) have achieved unprecedented capabilities in generating human-like text, posing subtle yet significant challenges for information integrity across critical domains. This challenge becomes severe, particularly in under-explored and low-resource languages like Arabic. This paper presents a comprehensive investigation of Arabic machine-generated text detection, examining outputs from diverse LLMs across academic and social media domains using multiple generation methods. Through systematic stylometric analysis, we identify distinctive linguistic patterns that differentiate human-written from machine-generated Arabic text, revealing domain-specific characteristics that vary notably across contexts. These insights inform our development of BERT-based detection models, which achieve exceptional performance in formal contexts (up to 99 % F1-score). However, cross-domain analysis reveals generalization challenges, consistent with prior literature. Robustness evaluation demonstrates that detectors maintain notable performance even with more than 90 % random text dropping. This work provides a comprehensive investigation of Arabic machine-generated text to date, uniquely integrating multiple generation methods, diverse LLMs, and in-depth stylometric analysis across varied domains to establish a foundation for robust, linguistically-informed detection systems essential for preserving information integrity in Arabic-language contexts.
| Original language | English |
|---|---|
| Article number | 130644 |
| Journal | Expert Systems with Applications |
| Volume | 305 |
| DOIs | |
| State | Published - 5 Apr 2026 |
Bibliographical note
Publisher Copyright:© 2025 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Keywords
- Arabic natural language processing
- Cross-domain generalization
- Large language models
- Machine-generated text detection
- Stylometric analysis
- Text classification
ASJC Scopus subject areas
- General Engineering
- Computer Science Applications
- Artificial Intelligence