Dissertation > Excellent graduate degree dissertation topics show

Named Entities Recognition and Normalization in Biomedical Literatures

Author: FanWenZuo
Tutor: LiLiShuang
School: Dalian University of Technology
Course: Applied Computer Technology
Keywords: Biomedical Named Entity Recognition and normalization two-layerstacking method multi-agent strategy Hungarian algorithm
CLC: TP391.4
Type: Master's thesis
Year: 2013
Downloads: 29
Quote: 0
Read: Download Dissertation


As a critical step of text mining in biomedical literature, Biomedical Named Entity Recognition (Bio-NER) and Gene Normalization (GN) in biomedical literature are presently one of the internationally concerned NLP (Natural Language Processing) research questions. Only when bio-entities are correctly identified and normalized, could other more complex tasks, such as, protein-protein interaction extraction, text classification, implicit knowledge discovery, be realized effectively.Contributions of this dissertation are as follows:(1) This dissertation presents a two-phase Bio-NER model which is based on two-layer stacking method and multi-agent strategy targeted at JNLPBA2004task. Our two-phase method divides the task into two subtasks:Named Entity Detection (NED) and Named Entity Classification (NEC). The NED subtask is accomplished based on the two-layer stacking method. In the first phase, where named entities (NEs) are distinguished from non-named-entities (NNEs) in biomedical literatures without identifying their types. Then six classifiers are constructed by four toolkits (CRF++, YamCha, Maximum Entropy, Mallet) with different training methods and integrated based on the two-layer stacking method. In the second phase for the NEC subtask, the multi-agent strategy is introduced to determine the correct entity type for entities identified in the first phase. Experimental results show that the presented approach can achieve an F-score of76.06%, which outperforms most of the state-of-the-art systems.(2) This dissertation presents a multistage gene normalization system targeted at BioCreAtIvE Ⅱ GN task, which consists of four major subtasks:pre-processing, dictionary matching, ambiguity resolution and filtering processing. For the first subtask, we apply the gene mention tagger developed in our earlier work, which achieves an F-core of88.42%on the BioCreative Ⅱ GM testing set. In the stage of dictionary matching, the methods of exact matching and approximate matching between gene names and the EntrezGene lexicon have been combined. For the ambiguity resolution subtask, we propose a semantic similarity disambiguation method based on Hungarian algorithm. At the last step, a filter based on Wikipedia to remove the false positives that represent gene family names rather than specific gene names has been built. Experimental results show that the presented system can achieve an F-score of90.1%, which outperforms most of the state-of-the-art systems. The approaches for named entity recognition and normalization in biomedical literatures in this dissertation are efficient, and these methods can be applied to other fields in biomedical text mining.

Related Dissertations

  1. The Methode of High Density Cells’ Tracking Based on Topological Constraint Combined with Hungarian Algorithm,Q25
  2. Research on Intelligent Scheduling Teconology of Interbay System in Semiconductor Wafer Fabrication System,TH165.1
  3. Research and implementation of signaling protocol for automatic switched optical network,TN929.1
  4. Structural phase transition path selection alternating iterative algorithm,O174
  5. Assignment Problem’s Algorithm and Its Realization,O22
  6. Research on Contour Based Shape Matching,TP391.41
  7. Research on the Solution and Application of an Unbalance Assignment Problem,F224
  8. Research on Color Feature Based Image Retrieval,TP391.3
  9. Research on the Test Platform of CTCS3 Onboard Equipment,U284.482
  10. The two distribution algorithm,TP301.6
  11. Research on Multi-depot Vehicle Scheduling Problem with Full Car Load,U116
  12. Study of Netlist Generation Methmod for the VLSI Automatic Test Equipment Interface Board,TN407
  13. Research on Basic Algorithms of Digital Image Processing and Implementation with FPGA,TP391.41
  14. Research on Facial Feature Extraction and Matching Algorithms for Image Retrieval,TP391.41
  15. Research of High Speed Image Pre-processing System Based on FPGA,TP391.41
  16. Research on Algorithms of 2D Face Template Protection,TP391.41
  17. Research of Visualization Technology in the Virtual Test of Missile,TP391.9
  18. The Research and Implemention of Image Retrieval Based on User Interested Feature,TP391.41
  19. Research of Image Mosaic Technology,TP391.41
  20. Research and Implementation of Exact String Matchiing Algorithms,TP391.41
  21. Research of Question Answering System Based on the Analysis of Lexical and Semantic Meanings,TP391.1

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Pattern Recognition and devices
© 2012 www.DissertationTopic.Net  Mobile