Dissertation > Excellent graduate degree dissertation topics show

Research on Confusion Network and Side Information for Speech Recognition

Author: WangHuanLiang
Tutor: HanJiQing
School: Harbin Institute of Technology
Course: Applied Computer Technology
Keywords: speech recognition confusion network side information multi-system fusion tone modeling
CLC: TN912.34
Type: PhD thesis
Year: 2007
Downloads: 341
Quote: 7
Read: Download Dissertation


Communicating freely with computer via speech is always people’s dream for many years. Although some great progress has been achieved in speech recognition area after several decades of unremitting efforts, it is still far away from the practical applications. How to further improve the performance and robustness has become the bottleneck of speech recognition.It is well-known that very limited acoustic and linguistics knowledge, i.e. spectral feature of speech signal and N-gram based statistical language model, is used in automatic speech recognition system. This information is far from enough for the complicated tasks like speech recognition since a large amount of information is implicitly utilized for human in the process of speech apperception.The performance of speech recognition can be improved by more effectively modeling and applying other side information. Confusion network is a more compact form representing multiple candidates, and word error rate can be minimized by performing second-pass decoding on confusion network. It is more significant for improving recognition performance to use confusion network as a decoding platform where various side information can be well integrated.Accordingly, two subjects are studied in this thesis: confusion network and side information. It is attempted to reduce character error rate by performing confusion network decoding with various side information. In the aspect of confusion network, the efficient approachs to generating and decoding confusion network are studied. In the aspect of side information, the effective methods are investigated to model and apply it. Major original works in the research are listed in details as follows:1 . Two approaches to efficiently generating confusion network are proposed. In the first one, lattice scale is reduced by segmenting original lattice into multiple sublattices, which can improve generation speed at a cost of slight decline of its quality. In the second one, the constructing process of confusion set is guided by the arc with maximum posterior probability, which can reduce the complexity of generation algorithm to linearity. Moreover, K-L divergence is introduced to measure the similarity between two arcs, which can increase the quality of confusion network. Finally, for Chinese speech recognition task two new structures of confusion network are introduced: character-based confusion network and logical confusion network.2 . Decoding methods integrating two types of side information on confusion network are studied. Trigger language model based on semantic class pairs is proposed to model dependence relationship between long-span words. The model is integrated with confusion network decoding process. Different speech recognition systems utilize different knowledge sources and modeling methods, consequently their error pattern is also different. A decoding method is proposed to combine the results from multiple recognition systems on confusion network. Experimental results show both methods can relatively reduced character error rate by 7.9% and 10.7%, respectively.3.It is investigated to use tone information to improve the performance of Chinese speech recognition. In the acoustic decoding stage, multi-space probability distribution based HMM (MSD-HMM) is adopted to model tone pattern, which resolves the problem that tone feature is discontinuous in the whole utterance. In the framework of two-stream HMM, spectral and pitch features can be decoded synchronously. In the second pass, tone information over a horizontal, longer time span is used to build explicit tone models which are apply to decoding on the confusion network generated in the first pass. Experimental results show that in the first-pass decoding 15.9% relative error reduction can be obtained in character recognition and an additional 8.0% relative error reduction by the second-pass decoding.4.A reliable speech input system with the ability of fast correcting input error is developed. Character-based confusion network is used to decompose sentence-level hypothesis into character-level one, which can allow the user to correct about half of recognition errors quickly and conveniently. In order to speed up new character input, speech recognition method assisted by handwriting information is proposed. It has faster input rate than single handwriting input and more reliable than single speech recognition.To sum up the above arguments, generation method of confusion network, its decoding methods integrating side information, modeling methods of side information and their application are investigated in this thesis, and the performance improvement is achieved for speech recognition. Efficiently constructing confusion network with high quality is the base of decoding, which is significant not only for speech recognition task but also for other tasks based on confusion network (such as speech document retrieval). The study on confusion network decoding methods, which integrate trigger language model based on semantic class pairs and the results from multi-system combination, also provides beneficial reference for utilizing other types of side information. Application of tone information remarkably improves the performance of speech recognition and also exhibits a good beginning for better utilizing various acoustic side information (such as stress, intonation etc). Speech input system becomes more reliable and its error correction process more convenient and efficient by using confusion network and handwriting information. This is a successful application of side information and confusion network in speech recognition.

Related Dissertations

  1. Multiple ANN/HMM Hybrid Used in Speech Recognition,TN912.34
  2. The Design of a DSP-Based Robot Speech Command Recgnition System,TN912.34
  3. The Design and Research of Health Management Based on Smartphone Environment,TN929.53
  4. Research and Implementation Error Elimination Mechanism Based Distributed Video Coding,TN919.81
  5. Study on Side Information Generation in Distributed Video Coding,TN919.81
  6. Research on the Technique of the Side Information in Distributed Video Coding,TN919.81
  7. Mobile audio and video interactive platform of business execution,TN915.09
  8. Research on Hmm-based Speech Recognition System of the Robot,TN912.34
  9. MFCC -based speech recognition system to improve research and design,TN912.34
  10. Robot Control System Simulation,TP242
  11. Research and Implemenation of Voice Intelligent plantform Based on VoiceXML,TP311.52
  12. Topic Classification of Speech Documents Based on the Word Fragment Network,TN912.3
  13. Study on Hybrid Model of Speech Recognition Based on HMM and PNN,TN912.34
  14. Mobile robot voice recognition control simulation system design and implementation,TN912.34
  15. Research on DBN-Based Continuous Speech Recognition,TN912.34
  16. STRAIGHT spectrum - based speech recognition algorithm research,TN912.34
  17. Research on the Key Technologies fo Speech Recognition for Robot Communication,TN912.34
  18. Region Based Motion Compensated Interpolation and Its Application in Distributed Video Coding,TN919.81
  19. Research on Side Information Estimation Algorithms in Distributed Video Coding Based on Block-based Motion Compensated Temporal Interpolation,TN919.81
  20. The LVCSR system based on adaptive methods of semi-supervised learning,TN912.34
  21. Parallel Optimization Method in Language Model for Mandarin Speech Recognition,TN912.34

CLC: > Industrial Technology > Radio electronics, telecommunications technology > Communicate > Electro-acoustic technology and speech signal processing > Speech Signal Processing > Speech Recognition and equipment
© 2012 www.DissertationTopic.Net  Mobile