IETE Journal of Research
Home | About us | Search | Current Issue | Past Issues | Guidelines | Subscribe | ContactLogin 
IETE Journal of Research
  Users Online: 22 Print this page  Email this page Small font size Default font size Increase font size
ARTICLE
Year : 2012  |  Volume : 58  |  Issue : 2  |  Page : 121-129

Investigation of Automatic Speech Recognition Performance and Mean Opinion Scores for Different Standard Speech and Audio Codecs


1 Department of ECE, Osmania University, Hyderabad, Andhra Pradesh, India
2 Research and Training Unit for Navigational Electronics, Osmania University, Hyderabad, Andhra Pradesh, India

Correspondence Address:
A V Ramana
Department of ECE, Osmania University, Hyderabad, Andhra Pradesh
India
Login to access the Email id

DOI: 10.4103/0377-2063.96179

Get Permissions

Usage of Automatic Speech Recognition (ASR) systems is increasing day-by-day for voice centric applications in mobile handheld and Voice over Internet Protocol (VoIP) devices. The necessity is also increasing to find out the ASR performance under different network impediments. Among them, speech and audio coding standards is the one, which affects the ASR performance greatly, when, using them with different sampling and bit rates in the practical systems. Another common impediment which influences the ASR accuracy is the bit errors in the wireless networks and packet drop conditions in the VoIP networks. ASR performance with some of the speech coding standards under noise conditions for the wireless networks is reported in the literature. However, each study is reporting the ASR performance for few narrowband codecs with different speech databases and different ASR toolkits like RAPHEL, HTK, SPHINX, etc. In this paper, the analysis on ASR performance while using both narrowband and wideband speech and audio coding standards, which are currently accepted for GSM mobile and VoIP networks, using the common speech database-TIMIT, and using ASR toolkit-SPHINX, is presented. The Mean Opinion Score (MOS), which is the generally accepted speech quality measurement technique, is also analyzed for all the speech and audio coding standards, using the same speech database. The results of the studies carried out for the ASR word accuracies and MOS values for different narrowband and wideband speech and audio codecs under no-loss conditions are presented. Results for different rates of packet drop condition which is the common noise scenario in wired networks such as VoIP (which is also merging with wireless networks) are also presented. The observation is that though some of the codecs are showing poor MOS performance at lower bit rates, the corresponding ASR performance is comparable with other codecs at higher bit rates.


[FULL TEXT] [PDF]*
Print this article     Email this article
 Next article
 Previous article
 Table of Contents

 Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
 Citation Manager
 Access Statistics
 Reader Comments
 Email Alert *
 Add to My List *
 * Requires registration (Free)
 

 Article Access Statistics
    Viewed582    
    Printed35    
    Emailed0    
    PDF Downloaded110    
    Comments [Add]    

Recommend this journal