Dissertation > Excellent graduate degree dissertation topics show

Phoneme Category Based Short Utterance Speaker Recognition

Author: FangYuanXiang
Tutor: ZhengFang
School: Tsinghua University
Course: Computer Science and Technology
Keywords: Short Utterance Speaker Recognition Phonemes Vowel Categories Consonant Categories Syllable Categories
CLC: TN912.34
Type: PhD thesis
Year: 2012
Downloads: 7
Quote: 0
Read: Download Dissertation


Speaker Recognition is determining the identity of a person by means of his/hervoice. Conventionally, large amount of audio data is required to perform speakerrecognition. In real life, various difficulties in acquiring speech data and variations inspeech quality can affect the speech. In such situations, it becomes crucial to use theavailable data, long or short, effectively to perform Speaker Recognition. Of late,research in speaker recognition has taken a turn towards Short Utterance SpeakerRecognition (SUSR) to devise new methodologies in order to improve SpeakerRecognition performance when utterance lengths are short. However, most of themethods define short utterances to be around10seconds long. Only recently has shortutterance been defined as utterance length of around3seconds. The shortest utterancein literature has been seen to be2seconds reaching a minimum Equal Error Rate (EER)of21.98%.We strive to find an effective way to recognize a speaker on test lengths of less thanor equal to3seconds. We keep Chinese language as our reference. In our quest for asolution, we present the following innovative research ideas:1) We propose text-independent speaker recognition for short utterances. For shortutterances of speech, there are variations in speech which can deteriorate performanceof speaker recognition. Although text-dependent speaker recognition can help to solvethis problem, at segments as short as a few seconds, speech recognition is not feasible.Therefore, we suggest the usage of rudimentary phoneme recognizer to make use ofspeech unit knowledge, making SUSR text-independent, while still using theunderlying speech information.2) We propose to use phoneme sequences rather than continuous speech forspeaker recognition using short utterances. Since phonemes are the smallestmeaningful unit of sound, the use of phoneme sequences would add useful knowledgeto the recognition process, at the same time preserving the idiosyncrasies of a speaker.3) In order to achieve the above goals, we suggest the use of phoneme categories.Phoneme Categories will make use of the knowledge of speech by grouping similar sounds under one category. This would not only solve the problem of having sparsedata in less-frequent categories but also make the distribution of phonemes acrosscategories fairly even. In doing so we propose Phoneme Category Based SUSR(PCBSUSR) method.4) In order to design the phoneme categories, we propose to study the phoneticand phonological properties of phonemes. For the purpose of confirmation of our ideaabout using Phoneme Categories for Short Utterance Speaker Recognition, we developVowel Categories (VC) based on their articulation properties.5) To measure the performance of combination of phonemes (vowels andconsonants), we propose designing Syllable Categories (SC), which are the mostnatural combination of vowels and consonants. We design Consonant categories (CC)and combine VCs and CCs to study and devise SCs by considering the syllablestructure of Standard Chinese.We test our method by training Universal Background VC, CC and SC Models andperforming recognition on3seconds,2seconds and1second long sequences of VCs,CCs and SCs obtained from test utterances. The results prove that there is importantspeaker information present in speech units as small as phonemes and syllables. Weconclude from our results that Syllable Categories are the best choice for speakerrecognition. Vowel categories have also performed very well in our proposed SUSR.According to our results, Consonants, however, are not a feasible choice to performSUSR. Comparing the minimum EER with the existing SUSR systems for2secondsof test utterance, our experimental results (based on Gaussian Mixture Model–Universal Background Model (GMM-UBM)) give relative EER reduction of54.50%and absolute EER reduction of11.8%in performance using one database, and relativeEER reduction of6.73%and absolute EER reduction of1.48%using another database.

Related Dissertations

  1. Research on Archaic Chinese Phonemes of Bernhard Karlgren’s Grammata Serica Recensa,H111
  2. An Empirical Study of the Influence of Shangzhou Dialect on the Acquisition of English Pronunciation,H319
  3. Speaker Recognition under Short Utterance Based on Support Vector Machine,TN912.34
  4. The Research on Segmentation and Recognition Algorithm of Syllables and Phonemes for Mandarin,TN912.34
  5. Inherited tradition of pioneering innovation,J632.51
  6. Problematic English Phones for Chinese EFL Learners,H319
  7. The Effect of Phonological Code for Hearing-impaired College Students in Lipreading Different Chinese Vowel Types Recognition,G762
  8. Impacts of the Dialect of Sichuan and Chongqing Students on Their English Pronunciation Learning,H319
  9. An Experimental Study of Using Movie Materials to Improve Senior High School Students’ English Pronunciation at the Suprasegmental Level,G633.41
  10. The Research of Comparing the Law of Sound Correspondences of the Cognate Words Between Cognate Words Dictionary "and Chinese Glottogonic Dictionary",H11
  11. Phonological Awareness, English Reading Achievement and English Underachievers in China,H319
  12. Interference of Native Language in Zhuang Nationality Students’ English Phonetic Learning in Guangxi and Its Countermeasures,H319.3
  13. Shandong Eastern dialects of English consonant phonemes acquisition and its countermeasures,H319
  14. Contrastive Study of Suprasegmental Phonology in English and Chinese: a Functional Perspective,H01
  15. Multiple ANN/HMM Hybrid Used in Speech Recognition,TN912.34
  16. The Design of a DSP-Based Robot Speech Command Recgnition System,TN912.34
  17. Research of Speaker Recognition Based on Support Vector Data Description,TN912.34
  18. Design of Speech Recognition System on Vehicular Multimedia Platform,TN912.34
  19. Research on Feature Extraction of Cough Sound and Its Application in Identification,TN912.34
  20. Mixing characteristics and Gaussian mixture model - based speaker recognition,TN912.34
  21. Research and Implementation of Chinese Speech Recognition Based on Speaker-Dependent,TN912.34

CLC: > Industrial Technology > Radio electronics, telecommunications technology > Communicate > Electro-acoustic technology and speech signal processing > Speech Signal Processing > Speech Recognition and equipment
© 2012 www.DissertationTopic.Net  Mobile