TY - GEN
T1 - Phonetically rich and balanced speech corpus for Arabic speaker-independent continuous automatic speech recognition systems
AU - Abushariah, Mohammad A.M.
AU - Ainon, Raja N.
AU - Zainuddin, Roziati
AU - Elshafei, Moustafa
AU - Khalija, Othman O.
PY - 2010
Y1 - 2010
N2 - This paper describes an efficient framework for designing and developing Arabic speaker-independent continuous automatic speech recognition systems based on a phonetically rich and balanced speech corpus. The speech corpus contains 415 sentences recorded by 42 (21 male and 21 female) Arabic native speakers from 11 Arab countries representing three major regions (Levant, Gulf, and Africa). The developed system is based on the Carnegie Mellon University (CMU) Sphinx tools. The Cambridge HTK tools were also used in some testing stages. The speech engine uses 3-emitting state Hidden Markov Models (HMM) for tri-phone based acoustic models. Based on experimental analysis of 4.07 hours of training speech data, the acoustic model used continuous observation's probability model of 16 Gaussian mixture distributions and the state distributions were tied to 400 senons. The language model contains both bi-grams and tri-grams. The system obtained 91.23% and 92.54% correct word recognition with and without diacritical marks respectively.
AB - This paper describes an efficient framework for designing and developing Arabic speaker-independent continuous automatic speech recognition systems based on a phonetically rich and balanced speech corpus. The speech corpus contains 415 sentences recorded by 42 (21 male and 21 female) Arabic native speakers from 11 Arab countries representing three major regions (Levant, Gulf, and Africa). The developed system is based on the Carnegie Mellon University (CMU) Sphinx tools. The Cambridge HTK tools were also used in some testing stages. The speech engine uses 3-emitting state Hidden Markov Models (HMM) for tri-phone based acoustic models. Based on experimental analysis of 4.07 hours of training speech data, the acoustic model used continuous observation's probability model of 16 Gaussian mixture distributions and the state distributions were tied to 400 senons. The language model contains both bi-grams and tri-grams. The system obtained 91.23% and 92.54% correct word recognition with and without diacritical marks respectively.
KW - Acoustic model
KW - Arabic Continuous Speech Recognition
KW - Arabic speech corpus
KW - Phonetically rich and balanced
KW - Statistical language model
UR - https://www.scopus.com/pages/publications/78650288728
U2 - 10.1109/ISSPA.2010.5605554
DO - 10.1109/ISSPA.2010.5605554
M3 - Conference contribution
AN - SCOPUS:78650288728
SN - 9781424471676
T3 - 10th International Conference on Information Sciences, Signal Processing and their Applications, ISSPA 2010
SP - 65
EP - 68
BT - 10th International Conference on Information Sciences, Signal Processing and their Applications, ISSPA 2010
ER -