Abstract
In this paper, we investigated the use of Learning Vector Quantization (LVQ) for phoneme transcription in Arabic speech recognition systems. We used Arabic speech corpus of TV news clips. Then, we employed feature vectors, which embed the frame neighboring correlation information between adjacent phonemes to replace the traditional trip hones models. Next, we generated the phonemes codebooks using the K-means splitting algorithm. After that, we trained the generated codebooks using the LVQ algorithm. When using the trained LVQ codebooks in utterance phoneme transcription of an open vocabulary test corpus, the phoneme recognition rate was 72% without the use of any added phoneme big rams or HMM models. The results of this research if improved could be used to serve the holy Quran text transcription without any phonemes big rams (phonemes language model). This would increase the speed of the Quranic speech to text transcription and creates the infrastructure of suitable high speed automatic identification system of Quranic sounds recognition and translation.
Original language | English |
---|---|
Title of host publication | Proceedings - 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, NOORIC 2013 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 407-412 |
Number of pages | 6 |
ISBN (Electronic) | 9781479928231 |
DOIs | |
State | Published - 25 Sep 2015 |
Publication series
Name | Proceedings - 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, NOORIC 2013 |
---|
Bibliographical note
Publisher Copyright:© 2015 IEEE.
Keywords
- K-means
- LVQ
- Quranic Speech Recognition
- codebooks
- phoneme bigrams
ASJC Scopus subject areas
- Information Systems