Dissertation > Excellent graduate degree dissertation topics show

Recognizing Named Entities in Biomedical Literatures

Author: ZhouRongPeng
Tutor: LiLiShuang
School: Dalian University of Technology
Course: Applied Computer Technology
Keywords: Text Mining Named Entity Recognition Biological named entity recognition Machine Learning
CLC: TP391.4
Type: Master's thesis
Year: 2009
Downloads: 93
Quote: 0
Read: Download Dissertation

Abstract


Biological named entity recognition is a key step for biomedical text mining , only correctly identify the biological named entities , in order to effectively complete gene ( protein ) standardization and protein - protein interactions and other more complex relation extraction . However , due to the biological named entity named irregular and ambiguity , biological named entity recognition has been a challenging task . This paper studies the biological medical English literature named entity recognition , including the JNLPBA2004 BioCreAtIvE 2 GM two kinds of experiments using the corpus . The main contribution of this paper include the following two points : ( 1 ) proposed a two-stage biological Conditional Random Fields (Conditional Random Fields, CRF) based named entity recognition method . The method JNLPBA2004 task is divided into two sub- tasks of the identification and classification , and the two sub- tasks is accomplished in two stages : in the first stage , i.e. the identification phase , the use of CRF model in the text to all of the potential biological named entities all marked , but does not distinguish between the categories; in the second stage , i.e. , the classification stage , using another model of CRF entity identified classification . In order to further improve the recognition performance of the system , the paper also classification stage before the four subsequent processing algorithms . The experimental results show that the proposed method for biological named entity recognition not only the model can effectively shorten the training time , but also further improve the performance of system identification , the method made ??on in JNLPBA2004 corpus 74.47% F 1 < / sub> evaluation value , 1.92% higher than JNLPBA2004 contest first . (2 ) In this paper, based on the integration of multi-model biological named entity recognition method for BioCreAtIvE 2 GM task . Firstly, using different machine learning algorithms and feature sets trained six different machine learning models , and then use a simple set operations ( such as union , intersection , etc.) and voting are two strategies to integrate their recognition results together . Experimental results show that the integration of multiple models recognition results can help to improve the recognition performance of the system , the proposed method on BioCreAtIvE 2 GM corpus 87.89% F 1 < / sub> evaluation value than BioCreative2 GM 0.68% first contest .

Related Dissertations

  1. Distortion effects on image quality evaluation and classification,TP391.41
  2. Learning-based human motion synthesis inverse kinematics,TP391.41
  3. Ontology-based medicine named entity recognition technology research,TP391.1
  4. Application of Support Vector Machine in Intrusion Detection System,TP18
  5. The Dynamic Distributed network intrusion patterns,TP393.08
  6. Research on biomedical named entity recognition,TP391.41
  7. Based on machine learning methods of image edge detection and application,TP391.41
  8. Research on the Machine Learning Theroy and Its Application in the Vehicle Navigation System,TN966
  9. Study on the Kazakh Named Entity Recognition Method Based on N-gram Model,TP391.43
  10. Performance Prediction Methodology Based on Machine Learning,TP181
  11. Research of Cancer Gene Expression Profile Classification Based on SVM and AdaBoost,TP311.13
  12. Supervision topic model research and application,TP391.1
  13. Linear chain conditional random the airport training algorithm optimization,TP181
  14. Boosting algorithm application in search engines,TP391.3
  15. Research on Named Entity Processing of Statistical Machine Translaton,TP391.2
  16. Sort learning based automatic evaluation method of translation,TP391.2
  17. Analysis of DDoS attack and defense research,TP393.08
  18. Armed Forces public opinion monitoring system design and implementation,TP393.09
  19. Research on Text Feature Extraction Oriented to Professional Fields,TP391.1
  20. Research of Chinese Text Feature Gain Method Based on Latent Semantic Analysis and Genetic Algorithm,TP391.1
  21. Japanese Morphological Analysis and Its Application for Clir,TP391.1

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Pattern Recognition and devices
© 2012 www.DissertationTopic.Net  Mobile