TY - GEN
T1 - Natural speaker-independent Arabic speech recognition system based on Hidden Markov Models using Sphinx tools
AU - Abushariah, Mohammad A.M.
AU - Ainon, Raja N.
AU - Zainuddin, Roziati
AU - Elshafei, Moustafa
AU - Khalifa, Othman O.
PY - 2010
Y1 - 2010
N2 - This paper reports the design, implementation, and evaluation of a research work for developing a high performance natural speaker-independent Arabic continuous speech recognition system. It aims to explore the usefulness and success of a newly developed speech corpus, which is phonetically rich and balanced, presenting a competitive approach towards the development of an Arabic ASR system as compared to the state-of-the-art Arabic ASR researches. The developed Arabic ASR mainly used the Carnegie Mellon University (CMU) Sphinx tools together with the Cambridge HTK tools. To extract features from speech signals, Mel-Frequency Cepstral Coefficients (MFCC) technique was applied producing a set of feature vectors. Subsequently, the system uses five-state Hidden Markov Models (HMM) with three emitting states for tri-phone acoustic modeling. The emission probability distribution of the states was best using continuous density 16 Gaussian mixture distributions. The state distributions were tied to 500 senons. The language model contains uni-grams, bi-grams, and tri-grams. The system was trained on 7.0 hours of phonetically rich and balanced Arabic speech corpus and tested on another one hour. For similar speakers but different sentences, the system obtained a word recognition accuracy of 92.67% and 93.88% and a Word Error Rate (WER) of 11.27% and 10.07% with and without diacritical marks respectively. For different speakers but similar sentences, the system obtained a word recognition accuracy of 95.92% and 96.29% and a Word Error Rate (WER) of 5.78% and 5.45% with and without diacritical marks respectively. Whereas different speakers and different sentences, the system obtained a word recognition accuracy of 89.08% and 90.23% and a Word Error Rate (WER) of 15.59% and 14.44% with and without diacritical marks respectively.
AB - This paper reports the design, implementation, and evaluation of a research work for developing a high performance natural speaker-independent Arabic continuous speech recognition system. It aims to explore the usefulness and success of a newly developed speech corpus, which is phonetically rich and balanced, presenting a competitive approach towards the development of an Arabic ASR system as compared to the state-of-the-art Arabic ASR researches. The developed Arabic ASR mainly used the Carnegie Mellon University (CMU) Sphinx tools together with the Cambridge HTK tools. To extract features from speech signals, Mel-Frequency Cepstral Coefficients (MFCC) technique was applied producing a set of feature vectors. Subsequently, the system uses five-state Hidden Markov Models (HMM) with three emitting states for tri-phone acoustic modeling. The emission probability distribution of the states was best using continuous density 16 Gaussian mixture distributions. The state distributions were tied to 500 senons. The language model contains uni-grams, bi-grams, and tri-grams. The system was trained on 7.0 hours of phonetically rich and balanced Arabic speech corpus and tested on another one hour. For similar speakers but different sentences, the system obtained a word recognition accuracy of 92.67% and 93.88% and a Word Error Rate (WER) of 11.27% and 10.07% with and without diacritical marks respectively. For different speakers but similar sentences, the system obtained a word recognition accuracy of 95.92% and 96.29% and a Word Error Rate (WER) of 5.78% and 5.45% with and without diacritical marks respectively. Whereas different speakers and different sentences, the system obtained a word recognition accuracy of 89.08% and 90.23% and a Word Error Rate (WER) of 15.59% and 14.44% with and without diacritical marks respectively.
KW - Arabic automatic speech recognition
KW - Hidden Markov Models
KW - Mel- Frequency Cepstral Coefficients
KW - Speech corpus
KW - Sphinx tools
UR - https://www.scopus.com/pages/publications/77957798876
U2 - 10.1109/ICCCE.2010.5556829
DO - 10.1109/ICCCE.2010.5556829
M3 - Conference contribution
AN - SCOPUS:77957798876
SN - 9781424462346
T3 - International Conference on Computer and Communication Engineering, ICCCE'10
BT - International Conference on Computer and Communication Engineering, ICCCE'10
T2 - International Conference on Computer and Communication Engineering, ICCCE'10
Y2 - 11 May 2010 through 12 May 2010
ER -