Within-word pronunciation variation modeling for Arabic ASRs: A direct data-driven approach

Dia AbuZeina*, Wasfi Al-Khatib, Moustafa Elshafei, Husni Al-Muhtaseb

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

Pronunciation variation is a major obstacle in improving the performance of Arabic automatic continuous speech recognition systems. This phenomenon alters the pronunciation spelling of words beyond their listed forms in the pronunciation dictionary, leading to a number of out of vocabulary word forms. This paper presents a direct datadriven approach to model within-word pronunciation variations, in which the pronunciation variants are distilled from the training speech corpus. The proposed method consists of performing phoneme recognition, followed by a sequence alignment between the observation phonemes generated by the phoneme recognizer and the reference phonemes obtained from the pronunciation dictionary. The unique collected variants are then added to dictionary as well as to the language model. We started with a Baseline Arabic speech recognition system based on Sphinx3 engine. The Baseline system is based on a 5.4 hours speech corpus of modern standard Arabic broadcast news, with a pronunciation dictionary of 14,234 canonical pronunciations. The Baseline system achieves a word error rate of 13.39%. Our results show that while the expanded dictionary alone did not add appreciable improvements, the word error rate is significantly reduced by 2.22% when the variants are represented within the language model.

Original languageEnglish
Pages (from-to)65-75
Number of pages11
JournalInternational Journal of Speech Technology
Volume15
Issue number2
DOIs
StatePublished - Jun 2012

Bibliographical note

Funding Information:
Acknowledgements This work is supported by Saudi Arabia Government research grant NSTP # (08-INF100-4). The authors would like also to thank King Fahd University of Petroleum and Minerals for its support of this research work.

Keywords

  • Data-driven approach
  • Language model
  • Modern standard Arabic
  • Pronunciation variation
  • Speech recognition

ASJC Scopus subject areas

  • Software
  • Language and Linguistics
  • Human-Computer Interaction
  • Linguistics and Language
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Within-word pronunciation variation modeling for Arabic ASRs: A direct data-driven approach'. Together they form a unique fingerprint.

Cite this