Dissertation

The Research on the Technology of Statistical-Based Chinese-English Machine Translation

Author: WeiZuo
Tutor: WangTing
School: National University of Defense Science and Technology
Course: Computer Science and Technology
Keywords: Statistical machine translation English machine translation Translation model Part of Speech Tagging Alignment model Language model Decoder Search algorithm A * search algorithm Dynamic Programming beam search algorithm Phrase alignment model ISA MI Alignment template Translation Memory
CLC: TP391.2
Type: Master's thesis
Year: 2006
Downloads: 274
Quote: 3
Read: Download Dissertation


With the rapidly growing popularity of the Internet, machine translation increasingly broad application prospects. Current statistical machine translation for translation between English, French, German or other Western languages, the paper studied the principles and techniques of statistical machine translation, and constructed on the basis of a prototype based on the statistics of the Chinese-English machine translation system. We work mainly includes two parts: 1, Chinese-English statistical machine translation based on word alignment model, this part of the study based on the source-channel model of statistical machine translation, the method is statistical machine translation research The most widely used method. Phrase alignment model based statistical machine translation studies, this part of the first part of the work is based. Word alignment model based Chinese-English machine translation research, we have adopted the IBM alignment model. Previous studies have shown, five models of IBM, the alignment of the effect of the best of the model 4, so our study to the IBM Model 4 is based. The main duties include: Building a Chinese-English translation model to construct the model of the English language decoder. As follows: 1) Construction of a Chinese-English translation model. Build a translation model introduces a part of speech information, experiments show that the introduction of POS information to improve the quality of word alignment parameters more accurately, the better the translation quality improved model search. 2) to achieve the A * and beam search algorithm. Were compared with the experimental data of the A * search algorithm and beam algorithm, the results show that the A * search algorithm to perform better in the Chinese-English statistical machine translation. 3) A * search algorithm to improve. A * search algorithm is extended only score the best node, Chinese and English are very different languages, English machine translation, only extended the optimal node will result in the wrong direction, missing a better quality translation. So we improved algorithm, the introduction of the width of the search strategies developed inspired to choose to extend node. The experimental results show that the quality of the translations of the improved algorithm to generate a more obvious. 4) in the Chinese-English statistical machine translation, empty words of some translation is very large. For Chinese-English translation, modified empty word translation model, experiments show this improvement to alleviate the adverse effects of the empty word Chinese-English translation. 5) In addition, we experimentally some parameters that affect the translation analysis, these parameters include Chinese word selection candidate English words a range of the A * search algorithm assumes the queue length, etc., and these parameters are set by experiment . Word alignment model does not consider the significance of the context, its shortcomings are obvious, so the current phrase alignment model-based statistical machine translation into a hot research. On the basis of the work in the front of the phrase alignment model based statistical machine translation, mainly the following tasks: 1) design will be based on the Viterbi aligned with the use of the IBM model training focus on segmentation and phrases alignment algorithm (ISA) method of combining word alignment experiments show that the method to further improve the accuracy of the training corpus word alignment. 2) We are using ISA algorithm, experiments for a single point of mutual information (MI) set formula, and MI threshold is set according to the experimental results. 3) design of the building alignment template method using POS information. 4) improve word alignment accuracy, we extracted from the training corpus, a large number of phrases instance, which makes the translation memory-based method can be used in the process of translation. 5) extracted from the training corpus template, template translation when the first match, then IBM Model 4 as the basis of assessment of the quality of the translations, measure the pros and cons of the translation, select the best translation. 6) through the experimental results show that: the extracted phrase instance higher quality, so the introduction of the method of the translation memory, and to improve the quality of a phrase translation; considering the semantics of the context and through the use of alignment template some extent overcome the word Qi of the defects of the model in this regard, to improve the efficiency of translation and correct rate.

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Translator
