Dissertation > Excellent graduate degree dissertation topics show

SVM and TSVM Based Chinese Entity Relation Extraction

Author: XuFen
Tutor: WangTing
School: National University of Defense Science and Technology
Course: Computer Science and Technology
Keywords: Information Extraction Entity relation extraction SVM TSVM Feature Selection The number of training examples Multiple classifiers
CLC: TP391.1
Type: Master's thesis
Year: 2007
Downloads: 232
Quote: 6
Read: Download Dissertation

Abstract


Automatic unstructured text information extraction technology into the structure of the text, either own system to meet the strong demand, or other applications such as information retrieval, text classification, automatic question answering and other important basic technology. Entity relation extraction is an important link in the information extraction technology is becoming more and more popular research topic. Chinese entity relation extraction work is still in its infancy, there is a lot of work needs to be done. In this paper, the characteristics of the Chinese entity-relationship, design a series of characteristics, including word tagging, entity attributes, and referred to information, overlapping relationships between entities and HowNet provide conceptual information to form the characteristics of the context of the relationship between entities vector and SVM classifier for Chinese entity relation extraction. ACE2004 training corpus as experimental data to obtain good recognition performance. According to the results of the classification experiments investigated in detail the performance of various feature sets and different number of training examples Chinese entities. The experimental results show that: the tasks of different degree of refinement should select a different degree of abstraction feature set combination. POS Feature sets are more suited to relationship discovery tasks, the the HowNet concept of feature set than for the relationship between categories and subcategories recognition task, word feature set is a basic set of features, the overlap feature sets between entities extraction performance for the greatest contribution. Increase in the size of the training corpus can improve recognition performance, the development of large-scale training corpus, it is necessary to use the SVM classifier; However, when the corpus reaches a certain size, of Corpus scale increase performance weakened, then the should be the main focus on the feature set constructed. On the basis of the above study, for SVM dependence on large-scale training corpus, the introduction of semi-supervised learning methods TSVM to Chinese entity relation extraction. Experimental results show that, far more than the number of training vectors hours the TSVM The performance SVM, TSVM performance but not as good as SVM, but a large number of training vectors. TSVM classifier using only a small amount of annotation corpus and a large number of unlabeled corpus, you can get a good performance and reduce the cost of extraction system to improve its portability; found such a relatively simple question in the relationship, but in more complex relationship categories to identify issues TSVM classifier performance is still not satisfactory, should consider additional semi-supervised learning method. At the same time of this study and to achieve a the TSVM multi-classifier constructed. Further work include two aspects, one is to improve the existing feature set as more features such as group block identification, the HowNet concept of structure is added to the feature set to improve the relation extraction performance and more precise parameter selection, quantitative research dimension data selection performance SVM and TSVM requirements, annotation data size law.

Related Dissertations

  1. Soft Sensor of Naphtha Dry Point on Support Vector Machines Regression,TE622.1
  2. The Research of the Fault Diagnoses Algorithm for the Liquid Rocket Engine Testing Bed Based on PCA-SVM,V433.9
  3. ISAR Imaging Simulation of Space Targets and Target Recognition Based on ISAR Images,TN957.52
  4. Research on Autamatic Music Structrue Analysis,TN912.3
  5. Research on Feature Extraction and Classification of Pulse Waveform for Cholecystitis and Nephrotic Syndrome Diagnosis,TP391.41
  6. Research on Word Alignment Based on Statistics and Linguistics and Correlation Fusion Strategy,TP391.2
  7. Research on Domain Entity Attribute and Event Extraction Technology,TP391.1
  8. Research on Text Classification Based on Biomimetic Pattern Recongnition,TP391.1
  9. Feature Extraction, Selection and Combination in Lipreading,TP391.41
  10. The Research on Paper Currency Classification Method Based on Harr-Like Feature and Minimal Ball Including Samples,TP391.41
  11. Research on Predicting Intrinsic Disorder Protein Structure Based on Supervision Manifold Learning Algorithm,Q51
  12. Study on the Road Condition Monitoring Based on Vehicular 3D Acceleration Sensor,TP274
  13. Research of Support Vector Machine Based Fault Diagnosis System,TH165.3
  14. Algorithm Research on Video-based Vehicle Detection, Tracking and Recognition for Intelligent Transportation,TP391.41
  15. The Research for Named Entity Recognition and Relation Extraction in Text,TP391.1
  16. Research on Feature-based Semantic Relation Extraction between Entities,TP391.1
  17. Ontology-based medicine named entity recognition technology research,TP391.1
  18. Research on Classification Method of Tongue Substance Color and Tongue Coating Color Based on SVM,TP391.41
  19. Research on Temporal Information Recognition and Normalization,TP391.1
  20. EEG -based image retrieval of emotional,TP391.41
  21. FSVM -based data mining method and its application to intrusion detection research,TP393.08

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing
© 2012 www.DissertationTopic.Net  Mobile