Generation of arabic phonetic dictionaries for speech recognition

Mohamed Ali*, Moustafa Elshafei, Mansour Al-Ghamdi, Husni Al-Muhtaseb, Atef Al-Najjar

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

26 Scopus citations

Abstract

Phonetic dictionaries are essential components of large-vocabulary natural language speakerindependent speech recognition systems. This paper presents a rule-based technique to generate Arabic phonetic dictionaries for a large vocabulary speech recognition system. The system used classic Arabic pronunciation rules, common pronunciation rules of Modern Standard Arabic, as well as morphologically driven rules. The paper gives in detail an explanation of these rules as well as their formal mathematical presentation. The rules were used to generate a dictionary for a 5.4 hours corpus of broadcast news. The phonetic dictionary contains 23,841 definitions corresponding to about 14232 words. The generated dictionary was evaluated on an actual Arabic speech recognition system. The pronunciation rules and the phone set were validated by test cases. The Arabic speech recognition system achieves word error rate of %11.71 for fully diacritized transcription of about 1.1 hours of Arabic broadcast news.

Original languageEnglish
Title of host publication2008 International Conference on Innovations in Information Technology, IIT 2008
Pages59-63
Number of pages5
DOIs
StatePublished - 2008

Publication series

Name2008 International Conference on Innovations in Information Technology, IIT 2008

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'Generation of arabic phonetic dictionaries for speech recognition'. Together they form a unique fingerprint.

Cite this