Dissertation > Excellent graduate degree dissertation topics show

Research of Named Entity Recognition and Automatic Pattern Acquisition in Information Extraction

Author: WuXueJun
Tutor: ZhuJingBo
School: Northeastern University
Course: Computer Software and Theory
Keywords: Information extraction named entity recognition Chinese name recognition organization name recognition Co-Training automatic pattern acquisition Similarity Computation
CLC: TP391.1
Type: Master's thesis
Year: 2005
Downloads: 484
Quote: 4
Read: Download Dissertation


With the advent of the information era and the development of the Internet, information explosion has become the bottleneck of information processing. There is an urgent need for the quick and accurate acquisition of these information. Information extraction is one of the most powerful measures to solve this problem.However, named entity recognition, automatic pattern acquisition and coreference resolution are all urgent problems to be solved. This paper researched the named entity recognition technology and automatic pattern acquisition technology and presented a series of solutions. In named entity recognition this paper mainly researched the Chinese name recognition and identification of Chinese organization names. Based on statistic over large-scale corpus, this paper built a Chinese name identification knowledgebase and presented the method of person name recognition with statistics and rules. This method gave attention to recall rate and precision rate. After test, the recall rate and precision rate are respectively 91.35% and 92.23%.In Chinese organization name recognition, this paper uses the machine learning method of Co-Training to build six knowledge-bases. Using organization compositive probability and the coinstantaneous probability of organization name words and suffixes, using information about inner characters of organization names and pre-introductory and post-introductory words of organization names, this paper presented an identification algorithm of Chinese organization names based on statistics and rules. The experiment achieved 90.2% precision and 81.7% recall respectively by close test, and 88.5% precision and 75.5% recall respectively by open test.Another work of this paper is the research on automatic pattern acquisition technology in information extraction. This paper presented an automatic patternacquisition method based on similarity computation in a creative way. Given a seed pattern, relevant patterns can be learned automatically from a large scale of unlabeled training corpus. The generated patterns can be put to use after a little manual correction. Compared to other algorithms, APAMBSC requires much less human intervention and avoids the necessity of hand-tagging training corpus. Experimental results show that APAMBSC learns patterns that achieve precision of 79.45% and recall of 66.51% in open test.At last this paper had a try about the design of Chinese information extraction system. Utilizing the technology this paper researches and the technology of our laboratory, this paper designed a system of Chinese information extraction.

Related Dissertations

  1. Research on Domain Entity Attribute and Event Extraction Technology,TP391.1
  2. Research on Temporal Information Recognition and Normalization,TP391.1
  3. Study on Growth Monitoring Technique Based on Pixel Un-Mixing Method and HJ Remote Sensing Images in Paddy Rice,S511
  4. Land Desertification in Qinghai Lake Landscape Pattern Change,X171
  5. Active faults based radar image information extraction method applied research and demonstration,P542.3
  6. Based on high-resolution remote sensing data mining houses information extraction,TP751
  7. Web Page Attribute Extraction Method Research,TP391.1
  8. The Research for Named Entity Recognition and Relation Extraction in Text,TP391.1
  9. The key component vertical search engine technology research,TP391.3
  10. Reptiles theme for Education News Design and Implementation,TP391.3
  11. GPU-based image search Chinese Research on key technologies of the retrieval,TP391.1
  12. Home Academic Information Extraction System,TP393.092
  13. Engineering News reported information extraction and applied research,G212
  14. Ontology-based medicine named entity recognition technology research,TP391.1
  15. CRF -based named joint extraction of entities and relationships,TP391.4
  16. Hull section robotic welding path planning and offline programming,TP242
  17. Printers based on natural language HCI Research and implementation,TP11
  18. Multi-language support program comprehension understanding and information extraction technology research,TP311.52
  19. Object-oriented Information Extraction of woodland,P237
  20. Study on Extraction of Coniferous Forest Information in Southern China,TP79
  21. Comparison and Improvement of Two Methods Based on Semi-Supervised Learning,TP18

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing
© 2012 www.DissertationTopic.Net  Mobile