Dissertation > Excellent graduate degree dissertation topics show

Research on Arabic Named Entity Recognition Using Hybrid Models

Author: HuSang
Tutor: LiuBingQuan
School: Harbin Institute of Technology
Course: Computer Science and Technology
Keywords: Arabic Named Entity Recognition Natural Language Processing Rule-based Approach Machine Learning Approach Hybrid Approach
CLC: TP391.1
Type: Master's thesis
Year: 2013
Downloads: 3
Quote: 0
Read: Download Dissertation

Abstract


Named Entity Recognition (NER) is one of the information extraction tasks thatfocus on recognize and classify named entities from unstructured text such as thenames of persons, organizations, locations, etc. Most of researchers uses machinelearning to deal with NER tasks, while few researchers uses handcrafted rules. Ourresearch is focus on NER for the Arabic language, which is an important languagewith many challenges. Named entities are the very important information in NaturalLanguage Processing especially in information retrieval, question answeringsystems, text classifications, text-summarization and information extraction. Arabiclanguage is the official language of Arab world and morphologically, syntacticallyand phonologically based on Classical Arabic. Arabic language now is the sixthwidely spoken language in the world and it is the mother tongue of300million ofpeople.Arabic Named Entity Recognition is still in the basic stages and there are nomany researches were done comparing with English language so we choose thistopic to enhance the quality of Arabic NER. We are focusing on the names ofpersons, organizations, locations. In this research we proposes a simple combinationof Rule-based with machine learning method as a hybrid method for Arabic namedentity recognition, which we have done by employing key words and special verbsas triggers to tag the named entities and use it as features for machine learningmodel.Arabic language has many challenges, some of them are the lack ofcapitalization, highly inflectional, morphological ambiguity and the character takesthree forms depending on its position and the lack of resources. Our proposedRule-based system is employing keywords and special verbs to tag the namedentities, we used gazetteers list for matching the named entities. The performance ofour Rule-based system achieved is F-value0.397.We implemented the second part of our hybrid system (Machine Learning) byusing Maximum Entropy model. The tags we got from the Rule-based system usedas feature to feed the machine learning model. Beside the rule-based featuresgeneral feature has been used for machine learning as POS: we used Stanford Part of Speech tagger to tag the words. The result of our Machine learning model isF-value0.495.The proposed hybrid system is laying on combining the two approach(Rule-based and Machine learning) by feeding the output of the Rule-based systemas features to the machine learning. Then complementing these features with othergeneral features we used like POS features and feed it to the classifier. Theperformance of our hybrid system is better than using Rule-based or Machinelearning individually, the result achieved by our Hybrid system is F-value0.528.We compared our hybrid approach with “ARNE” an Arabic named entityrecognition system has been published by “Carolin Shihadeh” and “G¨unterNeumann” on2012. The comparison shows that our system over performs theirsystem by F-value0.203.

Related Dissertations

  1. Word Sense Disambiguation Corpus Automatic Acquisition,TP391.1
  2. Research on Semantic Role Labeling for Chinese Nominal Predicates,TP391.1
  3. Study of Structuralization for Electrocardiography Diagnostic Report,R444
  4. Home Academic Information Extraction System,TP393.092
  5. Printers based on natural language HCI Research and implementation,TP11
  6. Based on Chinese Wikipedia semantic correlation computation Research and Implementation,TP391.1
  7. The Research of Topic Based Multi-document Summarization,TP391.1
  8. Research of Protein-Protein Interaction Extraction Based on Rich Feature and Multiple Kernels Learning,Q51
  9. The Research of Web-based Community Medical Intelligent Service System,TP311.52
  10. AraOntoLT: A Framework for Ontology Learning from Arabic Text,TP391.1
  11. Research on the Construction of a Nlp-oriented Chinese Sentence Semantic Knowledge Database,H13
  12. Research and Design of Intelligent Generation of Filtered Rules,TP393.08
  13. The Study and Analysis of Oracle Bone Inscriptions Based on Statistical Natural Language Processing,TP391.1
  14. Web Knowledge Service Oriented of Medical Information Classification Approach,TP391.1
  15. Research in Thesaurus-based Ontology Building Method,TP391.1
  16. Research on Event-oriented Multi-document Automatic Summarization,TP391.1
  17. Research on Transformation from Use Case Diagrams to Sequence Diagrams,TP311.52
  18. Research and Implementation on Technologies of Natural Language Query Based on Database,TP391.1
  19. Word Sense Disambiguation Technology Research Based on HowNet and Bayesian Model,TP391.1
  20. Research on Mongolian Lexical Analysis Based on Combination of Statistical and Rule Approaches,TP391.1
  21. Chinese and English Automatic Summarization Based on Topic Modeling,TP391.1

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing
© 2012 www.DissertationTopic.Net  Mobile