Dissertation > Excellent graduate degree dissertation topics show

Chinese Named Entity Recognition Based Statistical Machine Learning

Author: MengYing
Tutor: FengLiHui
School: Kunming University of Science and Technology
Course: Control Theory and Control Engineering
Keywords: Named Entity Recognition Statistics Machine Learning Rule Text preprocessing
CLC: TP182
Type: Master's thesis
Year: 2004
Downloads: 376
Quote: 7
Read: Download Dissertation

Abstract


Computational linguistics research named entity to determine the meaning of the noun in the sentence. Contains important information in the text named entity, named entity recognition research is one of the most meaningful information extraction. In addition, the text appears frequently named entity is also constrained the most important reason to improve the accuracy of the segmentation. Recognition will directly affect the accuracy of the segmentation and subsequent speech tagging and parsing accuracy, named entities automatically identify the Chinese word key issues and hot issues. Therefore, the study named entity recognition has important theoretical and practical significance. Often focus on named entity can be divided into seven categories: \Named entity recognition using two methods: the methods based on statistical methods and rule-based. Former use of statistical methods automatically extracted from the real text of named entities constitute a rule, the automatic identification of training language model named entity: the latter mainly linguists rational knowledge, written by linguists rules to identify named entities. In this paper, a method of combining statistical rules, on the one hand, the use of large-scale corpus to train statistical named entity with the word wording of the law. On the other hand, a large number used to identify the rules extracted from the corpus has been marked, the ultimate success of these named entity recognition, recognition accuracy than based solely on statistical methods and simple rule-based approach are improved, Specifically, this work is mainly concentrated in the following areas: 1. Chinese character encoding conversion. Chinese character coding is the first step in the Chinese text into computer processing. Chinese due to the presence of Simplified, Traditional two decent coding more complex encoding format is not unified. In this paper, and to achieve the Chinese encoding decency before conversion, support for all major Chinese encoding conversion and coexistence of a variety of character symbol set. To provide a basis for the text after the pre-processing and named entity recognition. 2. Non-kanji symbols and numbers recognition. Non-kanji symbols and numbers is relatively easy to identify the part of the text can be processed before the named entity recognition. Different formats, different text clause, then the recognition of non-kanji symbols in the text, such as the percentage of the amount of Arabic numerals, and Chinese figures. 3. Based on the the Onomastic recognition of the evaluation function. In this paper, large-scale marked corpus for training, statistics place names with the word, wording and context characteristics of candidates based on the statistical evaluation function based on statistical machine learning Chinese named entity recognition names, place names scoring by the use of dynamic programming method to identify the possible location of the place names in the text. Named entity recognition based on decision tree. The introduction of machine learning methods, a decision tree-based recognition model, a combination of the basic structural features of named entity syntax and context characteristics, the method does not rely on word segmentation system, the sub-word corpus after processing, but also other named entity recognition. Institutions based on template matching name recognition. By real corpus to obtain a large number of institutions name-depth analysis of the composition of the institutional name, summed up the template for the identification of the name of your organization, the Chinese organization name recognition based on template matching method. Introduce integrated the two systems based on a variety of strategies named entity recognition technology, and several examples are given, named entity recognition in which the role of. The present experimental results show that the statistics and rules used in this paper the method of combining the the ideal recognition accuracy, made named entity recognition fear cover all the categories of the named entities, taking into account the presence of the Chinese text of the pre the problem. The work done by a certain significance and practical value.

Related Dissertations

  1. Research on Orbital Control Method for Space Rendezvous and Docking,V526
  2. Improvement of Ant Colony Algorithmand Its Application in Robot Path Planning,TP242
  3. Xu Peng age of academic thought and clinical experience medication regularity of the treatment of chronic gastritis,R249.2
  4. Study on Mental Health and It’s Impact Factors of International Students in China,B849
  5. Research on Network Supervision under the Perspective of Democracy and Rule of Law,D630.9
  6. Research on Athletics Gymnastics Horizontal Bar Technology Development Regularity,G832
  7. The Design and Research of Health Management Based on Smartphone Environment,TN929.53
  8. On Ecological Politics of Murray Bookchin,D09
  9. Zhou Dean of academic thought and clinical experience and treatment of Tourette Syndrome the acupoints drug laws,R246
  10. Research on Legalization of Political Power Operation from the Vision of the Rule of Law,D920.0
  11. The Comparison and the Consideration about the Difference in Curriculum Standards Content in Statistics and Probability between China & England,G633.6
  12. Research on Stay-at-home Experience of Childhood on the Impact of College Students’ Values,G641
  13. Current Status of Nutrient Input and Soil Nutrient Content Change Characteristics in Solar-greenhouse in Quwo County,S626
  14. Construction of Culture of Integrity,D262.6
  15. Evolution of Volleyball Competition Rules modify the impact of the development of volleyball,G842
  16. Game technical statistics in Gymnastics Sports Science Research,G832
  17. Technology for women's gymnastics competition winning Factors,G832
  18. On Criteria for Selection of Criminal Elements,D914
  19. Studies on the Public Service Barrister System of China,D926.5
  20. Investigation of Present Situation and Study of Statistics on S Ports Dance Teachers Team in Some Colleges in Jiangsu Province,G834
  21. Based on Data Distribution Characteristics of Text Classification,TP391.1

CLC: > Industrial Technology > Automation technology,computer technology > Automated basic theory > Artificial intelligence theory > Expert systems, knowledge engineering
© 2012 www.DissertationTopic.Net  Mobile