Research on Chinese Spoken Term Detection Technology for News Corpus

Author: WangKeWei
Tutor: HanJiQing
School: Harbin Institute of Technology
Course: Computer Science and Technology
Keywords: spoken term detection news corpus lattice n-gram model long distancebigram model automatic corpus splicing
CLC: TN912.34
Type: Master's thesis
Year: 2012
Downloads: 17
Quote: 0
Spoken term detection (STD) returns relevant segments from a given corpus of speechdata according to users’ queries which are in text form. STD is an important area ofspeech recognition and has broad application prospects. The design of STD system isusually implemented in two stages: off-line indexing and online searching. Obviously,the accuracy of the STD system is highly related to the quality of the index.Indexing is usually based on the output of the ASR system. The indices of mostSTD system are based on lattice, which is the output of the speech recognition. Thelattice has reasonable structure and contains plentiful of information. The probabilityof the local path through the lattice can be obtained according to the acousticlikelihood and language model and such information is kept in the lattice. It’s a simpleand effective way to take this probability as confidence measure when indexing. Asthe traditional N-gram model (i.e. the bigram model) does not consider the syntacticand semantic constraint of further words, it misses some information. The longdistance bigram model in this paper captures different aspects of the syntactic andsemantic constraint between words, the STD system based on the lattice and the longdistance bigram other than the traditional N-gram model will improve the quality ofthe indices and the performance of the system. Our experiments consider theperformance of the STD systems based on different distance of bigram anddemonstrate that, when integrating results from systems based on different distances,we can get higher detection recall over system based on traditional N-gram models.News corpus is an ideal choice of constructing speech recognition system in STDsystem for news databases. In the front of the STD system, the input speech needs tobe converted into text by a speech recognition system. But commercial news corpus atpresent does not have a detailed transcript. The transcript is of paragraph level notphrase level. It cannot be used when doing recognition task. This paper presents anautomatic method of segmenting the speech of paragraph level based on speechrecognition. The method constructs a linear recognition network first of all, theninserts silence models between short speech utterances, finally does decodingprocessing over the speech. The experiments demonstrate that this method shows fineperformance when splicing segments of paragraph level less than11minutes. Weconclude that it is an effective method of splicing paragraph level speech.

CLC: > Industrial Technology > Radio electronics, telecommunications technology > Communicate > Electro-acoustic technology and speech signal processing > Speech Signal Processing > Speech Recognition and equipment
