Dissertation > Excellent graduate degree dissertation topics show

Research on Domain Entity Attribute and Event Extraction Technology

Author: FengErBo
Tutor: LiSheng
School: Harbin Institute of Technology
Course: Computer Science and Technology
Keywords: information extraction entity-attribute extraction event extraction hidden markov model maximum entropy model
CLC: TP391.1
Type: Master's thesis
Year: 2008
Downloads: 111
Quote: 0
Read: Download Dissertation

Abstract


At present, information extraction in natural language processing has become a hot research. The information extracted by IE systems not only can provide for the end user, but also is the first step to build an intelligent query system and a data mining system. Entity-attribute extraction and event extraction in information extraction both provide initial operation for specific applications. Entity-attribute extraction can be applied to entity definition and data mining, while event extraction can be applied to event classification and trace. The self-learning method and the maximum entropy model have been introduced during our work. The dissertation concerns the following aspects:1. Domain character recognition. Domain character extraction is the preparatory work of entity-attribute extraction. In this paper, method based on self-learning is adapted for domain character recognition. First, we use domain lexes as seed words to recognize domain character; second, learn rules according to domain character, third, recognize domain character and domain lexes through the learned rules; lastly, set new domain lexes as new seed words to recognize domain character. The iterations are repeated until there are no new domain lexes. This method has obtained satisfying experimental result.2. Entity-attribute extraction. Entity attribute extraction aims to extract attributes and corresponding attribute values. In the paper, entity-attribute extraction is based on parsing, and realized by combination of rule and statistics method. First, we parse the text after domain character is recognized; second, extract the syntactic chunks that contain attributes and attribute values in parsing trees; lastly, extract attributes and corresponding attribute values from the syntactic chunks.3. Event extraction. In the paper, maximum entropy model is used for event extraction. First, we recognize all the event elements from corpus through methods based on rule and statistics respectively; second, judge whether these event elements are related to the event through the model trained by maximum entropy algorithm. The method has achieved good results.

Related Dissertations

  1. Packet Loss Recovering Technology for Speech Transmission over Network,TN912.3
  2. Research on Temporal Information Recognition and Normalization,TP391.1
  3. Research on Extraction and Tracking of People’s Opinion,TP391.1
  4. Study on Growth Monitoring Technique Based on Pixel Un-Mixing Method and HJ Remote Sensing Images in Paddy Rice,S511
  5. Land Desertification in Qinghai Lake Landscape Pattern Change,X171
  6. Active faults based radar image information extraction method applied research and demonstration,P542.3
  7. Based on high-resolution remote sensing data mining houses information extraction,TP751
  8. The Frame Disambiguation of Automatic Identification of Chinese Frame,TP391.1
  9. Web Page Attribute Extraction Method Research,TP391.1
  10. The Research for Named Entity Recognition and Relation Extraction in Text,TP391.1
  11. Research for Event Extraction Method in Specific Domain Based on Tree Conditional Random Field,TP391.1
  12. Engineering News reported information extraction and applied research,G212
  13. Topic search engine key technology research,TP391.3
  14. Hull section robotic welding path planning and offline programming,TP242
  15. Based on semi- structured text transporter protein substrate information extraction system,Q811.4
  16. Based on self-learning social relation extraction research,TP391.1
  17. Dynamic learning framework based on structured automatic web data extraction method,TP393.092
  18. Web-oriented Chinese automatic summarization research generated,TP391.1
  19. Printers based on natural language HCI Research and implementation,TP11
  20. Multi-threaded fusion soccer video semantic analysis and event detection,TP391.41

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing
© 2012 www.DissertationTopic.Net  Mobile