Dissertation > Excellent graduate degree dissertation topics show

New Chinese words found their speech Tagging

Author: YangHui
Tutor: ZhangJie
School: Fudan University
Course: Applied Computer Technology
Keywords: New Word Discovery Part-of-Speech (POS) tagging Natural Language Processing (NLP) Support Vector Machine (SVM)
CLC: TP391.1
Type: Master's thesis
Year: 2008
Downloads: 216
Quote: 1
Read: Download Dissertation


With the rapid development of society and economy, Chinese language has been enriched and developed. More and more new words keep emerging, which brings more challenges into Chinese word segmentation task. The unrecognized new words can result in too many sequences of single characters in the segmented sentence, which decreases the segmentation precision to a remarkable extent. Therefore, the new word discovery has become a difficult problem and a bottleneck in Chinese segmentation task and how to discover the new words has became an important research field. Part-of-speech (POS) is an important attribute of words and the main bridge that connects the word with the syntax. Therefore, POS tagging should provide high-quality intermediate result for the post process of nature language processing (NLP), but the emergence of new words reduce the POS tagging performance to a certain extent.Currently, many researchers are working on the new word discovery problem and have presented kinds of approaches. However, its new words are limited to the domain or features are limited to the frequency of new words. In this paper, we first review previous work and propose a SVM-based hybrid method for new word discovery, trying to integrate the advantages of the statistics-based method and the rule based method to improve the performance of the new word discovery and POS tagging. In the statistics module, new word discovery is defined as a binary classification problem, in which we considered the previous new words features which focus on the inner feature of the word and proposed context information, as well as constraints, which reveal the relationships among the new word candidates. And some rules are introduced aimed to improve the performance. Finally, we assigned POS tagging for the new words.This paper designs and constructs a system, which implements new worddiscovery and POS tagging. Some key techniques are also illustrated in the paper.1. In the research of new word discovery, support vector machine (SVM) isintroduced to solve the classification. SVM has been successfully applied inpattern recognition and classification and SVM can find an optimal separatinghyper plane between data points of different classes in a high dimension space.And in the frame of SVM, some rules are introduced to complement the shortageof statistics-based method to improve the performance. The SVM based hybridmethod for new word discovery and its brief processing flow are described in thispaper. 2. In the research of new word POS tagging, we also define it as classification problem and deal with it with SVM, which considered the inner structure and external concatenation information. Finally, we transform a multi-class classification problem into a binary classification problem by construct a new mapping function.Finally, according to the experiment that are conducted on a one-month news of year 1998 from the People’s Daily as, the precision of new word discovery we achieved is up to 60.81%, while the recall is 68.94, and the F-measure is 64.62. The precision of POS tagging is up to 90%.

Related Dissertations

  1. Object -oriented information extraction ontology Key Technology Research and Implementation,TP391.1
  2. Sleep EEG Signal Processing and Application in Sleep Staging,R318
  3. Land Use Information Remote Sensing Extraction and Effects of Different Land Use on Soil Quality in Greenhouse Vegetable Region,S626
  4. A Parallel Image Mosaic Method Based on Feature Matching,TP391.41
  5. Virtual Communities bad information filtering technology research,TP393.09
  6. Based on SVM-RFE potential biomarker selection algorithm,TP311.13
  7. Research on Autamatic Music Structrue Analysis,TN912.3
  8. Fault Diagnosis Method Based on Vector-HOS and Its Application,TH165.3
  9. Bioinformatics methods to study protein interactions,Q811.4
  10. The detection of the image invisible information,TP391.41
  11. XLPE Cable Partial Discharge Electromagnetic Coupled Detection and Its Pattern Recognition Research,TM247
  12. A Study on Track Correlation Method Based on Time Sequence SVM Information Fusion for Target,TN957.51
  13. Research on Application of SVM to Individual Housing Loan Credit Risk Evaluation,TP311.13
  14. Based on Gesture Recognition by Used Mobile Phone Universal Control,TP391.41
  15. Research on Support Vector Machine Based Weed Discrimination,TP391.41
  16. The Similarity Analysis over Ultrasound Specle Image of Coagulation Necrosis Tissue and Experimental Research,R445.1
  17. Research on Fault Diagnosis of Nuclear Power Equipment Based on Support Vector Machine,TM623.7
  18. The Application of Support Vector Machine in Cable Fault Classification,TM755
  19. The Applied Research of the Statistical Pattern Recognition Classification of ECG Common Diseases,TP391.4
  20. Web - based English-Chinese bidirectional unknown word translation method research,TP391.2

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing
© 2012 www.DissertationTopic.Net  Mobile