TY - GEN
T1 - Generation of arabic phonetic dictionaries for speech recognition
AU - Ali, Mohamed
AU - Elshafei, Moustafa
AU - Al-Ghamdi, Mansour
AU - Al-Muhtaseb, Husni
AU - Al-Najjar, Atef
PY - 2008
Y1 - 2008
N2 - Phonetic dictionaries are essential components of large-vocabulary natural language speakerindependent speech recognition systems. This paper presents a rule-based technique to generate Arabic phonetic dictionaries for a large vocabulary speech recognition system. The system used classic Arabic pronunciation rules, common pronunciation rules of Modern Standard Arabic, as well as morphologically driven rules. The paper gives in detail an explanation of these rules as well as their formal mathematical presentation. The rules were used to generate a dictionary for a 5.4 hours corpus of broadcast news. The phonetic dictionary contains 23,841 definitions corresponding to about 14232 words. The generated dictionary was evaluated on an actual Arabic speech recognition system. The pronunciation rules and the phone set were validated by test cases. The Arabic speech recognition system achieves word error rate of %11.71 for fully diacritized transcription of about 1.1 hours of Arabic broadcast news.
AB - Phonetic dictionaries are essential components of large-vocabulary natural language speakerindependent speech recognition systems. This paper presents a rule-based technique to generate Arabic phonetic dictionaries for a large vocabulary speech recognition system. The system used classic Arabic pronunciation rules, common pronunciation rules of Modern Standard Arabic, as well as morphologically driven rules. The paper gives in detail an explanation of these rules as well as their formal mathematical presentation. The rules were used to generate a dictionary for a 5.4 hours corpus of broadcast news. The phonetic dictionary contains 23,841 definitions corresponding to about 14232 words. The generated dictionary was evaluated on an actual Arabic speech recognition system. The pronunciation rules and the phone set were validated by test cases. The Arabic speech recognition system achieves word error rate of %11.71 for fully diacritized transcription of about 1.1 hours of Arabic broadcast news.
UR - http://www.scopus.com/inward/record.url?scp=67649460921&partnerID=8YFLogxK
U2 - 10.1109/INNOVATIONS.2008.4781716
DO - 10.1109/INNOVATIONS.2008.4781716
M3 - Conference contribution
AN - SCOPUS:67649460921
SN - 9781424433971
T3 - 2008 International Conference on Innovations in Information Technology, IIT 2008
SP - 59
EP - 63
BT - 2008 International Conference on Innovations in Information Technology, IIT 2008
ER -