Dissertation > Excellent graduate degree dissertation topics show

Research on Some Key Aspects of Statistical Machine Translation

Author: XueYongZeng
Tutor: LiSheng
School: Harbin Institute of Technology
Course: Applied Computer Technology
Keywords: machine translation statistical approach bilingual phrase pair direct decoding algorithm information extraction
CLC: TP391.2
Type: PhD thesis
Year: 2007
Downloads: 525
Quote: 2
Read: Download Dissertation

Abstract


Machine translation is the use of a computer to translate one natural language into another, which can be viewed as a decision problem. The major directions of research in machine translation include rule-based, interlingua-based, example-based and statistical methods. Currently, statistical machine translation shows its benefits and has received much attention. Statistical translation models involve word-based, phrase-based and syntax-based models. In this paper, some key techniques of phrase and syntax-based models are carefully studied. As a first step, three classical machine translation methods are systematically compared and the advantages and disadvantages of these methods are discussed in detail. On this basis, the problem of efficient extraction of bilingual phrase translation pairs is studied. As to syntax-based method, focuses are placed on the decoding problem, which leads to direct decoding algorithms. In the meanwhile, a syntax-based reordering model is also presented for phrase-based statistical machine translation. Finally, a brief translation approach based on information extraction is proposed, in which the long comings of statistics and rules are combined. This thesis is arranged as follows:1. The classical approaches of statistical machine translation are analyzed, and the new strategy that is different from the classical approaches is tried. By analyzing the experimental results, the long-comings and shortcomings of these approaches are pointed out. Especially, further analysis is made on the conventional syntax-based statistical machine translation. Then, a framework for refinement is proposed as a preparation for further studies, which presents the strategies for incorporating the syntax into the phrase-based method, and combining the statistical approach and the rule-based approach.2. Extraction methods of phrase translation pairs from n-best alignments are studied. A loose phrase extraction method is proposed, and constraints of extraction are applied to further improve the effect of phrase extraction. The proposed constraints include the constraint based on intersection of alignment points and the constraints based on words similarities. For the latter, three metrics, dice coefficient, phi-square coefficient and log-likelihood ratio, are carefully studied and compared. Experimental results show that the loose phrase extraction is an efficient method for extracting bilingual phrase pairs from n-best alignments, and the translation quality is further improved when introducing the above constraints. Compared with the conventional method, which extracts bilingual phrase pairs from one-best alignment, the qualities of translation results can be significantly improved through the loose phrase extraction and n-best alignments.3. Decoding problem of syntax-based statistical machine translation is studied. After analyzing the shortcomings of reverse decoding method, which fails to make efficient use of the parsing tree to direct the process of translation, the motivation of direct decoding is proposed. Two methods are proposed for direct decoding, the direct decoding algorithm based on beam search and the direct decoding algorithm based on greedy search. Experimental results show that the direct decoding methods outweigh the reverse decoding method, which indicates that the structural information of the parsing tree can be efficiently imposed to direct translation process by direct decoding. By introducing syntactical structure into the phrase-based statistical model, a syntax-based reordering model is also presented, which is helpful to solving the problem of long-distance reordering.4. An IE-based method for brief machine translation is presented to meet the needs of information browsing, under the state-of-the-art of the current machine translation technology. Firstly, the key information of a sentence is extracted and minor parts are dropped by the information extraction; then the skip translation is performed on the extracted parts. Focuses are placed on the hybrid strategy of combining the statistical approach and the rule-based approach. In this strategy, the language model is applied to select proper translations from alternative results that generated by different translation models. Experimental results show that this method is helpful to generating clear translation results and avoiding messy ones, with only little loss of key information.

Related Dissertations

  1. Research on Structure Transition Technology for SMT,TP391.2
  2. Research on Domain Entity Attribute and Event Extraction Technology,TP391.1
  3. Research on Temporal Information Recognition and Normalization,TP391.1
  4. The Research of Decoding Algorithm for Statistical Machine Tranlation,TP391.2
  5. Study on Growth Monitoring Technique Based on Pixel Un-Mixing Method and HJ Remote Sensing Images in Paddy Rice,S511
  6. Land Desertification in Qinghai Lake Landscape Pattern Change,X171
  7. Active faults based radar image information extraction method applied research and demonstration,P542.3
  8. Based on high-resolution remote sensing data mining houses information extraction,TP751
  9. Web Page Attribute Extraction Method Research,TP391.1
  10. The Research for Named Entity Recognition and Relation Extraction in Text,TP391.1
  11. Home Academic Information Extraction System,TP393.092
  12. Engineering News reported information extraction and applied research,G212
  13. Topic search engine key technology research,TP391.3
  14. Advanced alignment machine translation technology and development set selection strategy study,TP391.2
  15. Hull section robotic welding path planning and offline programming,TP242
  16. Based on semi- structured text transporter protein substrate information extraction system,Q811.4
  17. Dynamic learning framework based on structured automatic web data extraction method,TP393.092
  18. Web-oriented Chinese automatic summarization research generated,TP391.1
  19. Printers based on natural language HCI Research and implementation,TP11
  20. Multi-language support program comprehension understanding and information extraction technology research,TP311.52
  21. Template independent web information extraction,TP393.092

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Translator
© 2012 www.DissertationTopic.Net  Mobile