Dissertation > Excellent graduate degree dissertation topics show

Research on Automatic Notation of Word for Tibetan Corpus Based on HMM

Author: SuJunFeng
Tutor: QiKunZuo
School: Northwest University for Nationalities
Course: Linguistics and Applied Linguistics
Keywords: Tibetan language Corpus Dictionary Part of Speech Tagging Hidden Markov Model
CLC: H214
Type: Master's thesis
Year: 2010
Downloads: 58
Quote: 0
Read: Download Dissertation

Abstract


The more rapid development of corpus linguistics in recent years , it has opened up a new path for language study . English, Chinese , such as word frequency statistics research carried out at different levels of the minority language corpus quantitative research laid the reliable , solid foundation and learn from experience . Tibetan information processing technology development and the achievements of Tibetan studies to create the conditions for carrying out the Tibetan Corpus research and word frequency statistics . Tibetan part-of-speech tagging is a basic topic in the Tibetan information processing technology . The one hand, the results of its research directly into information extraction , information retrieval , machine translation , and many other practical applications which ; On the other hand , the the Tibetan automatic part-of-speech tagging Block Recognizer Tibetan language , Tibetan parser Tibetan semantic analyzer essential front-end tools . Therefore , research and Tibetan speech tagging has important theoretical significance and practical value . Part-of-speech tagging rule-based and statistical - based two categories . Statistics - based approach does not require manual summarizes linguistic rules , the correct identification rate advantages , has gradually become a hot research . Based on statistical methods , HMM is one of the most widely used algorithms model in this paper is mainly based on the statistical part of speech labeling technology , achieved Tibetan speech tagging system mainly by a hidden Markov model training corpus statistics obtain the required parts of speech and vocabulary of probability information , the Tibetan training corpus smaller due to sparse data , the use of a simple and efficient data smoothing algorithm law data smoothing , and then through the vocabulary of probability information and part of speech transfer probability information to establish the core dictionary and the Bigram model dictionary Finally, in accordance with the above two dictionary using the Viterbi algorithm to select the best tag string marked . Tentative of this experimental study, the computer automatically handle Tibetan corpus research . The study proved that the use of HMM methods of Tibetan corpus part of speech automatically labeling can be achieved , in the system the closed testing correct rate of 88% -90% .

Related Dissertations

  1. Packet Loss Recovering Technology for Speech Transmission over Network,TN912.3
  2. Research on Domain Entity Attribute and Event Extraction Technology,TP391.1
  3. Context-Dependent Lexical Paraphrasing,TP391.1
  4. Storage, Management and Sharing of Farmland Information Based on Metadata,S126
  5. Digital image forensics technology research,TP391.41
  6. A Study of the Acquisition of Chinese Progressive Complex Sentence Based on the Interlanguage Corpus,H195
  7. A Corpus-Based Intertextuality Analysis of the Reportage on Shanghai World Expo,H052
  8. Design and Implementation of Data Dictionary in Da Meng DBMS,TP311.13
  9. The Study of the Marking of Part of Speech Based on Class a in the Outline of the Graded Vocabulary for HSK,H146
  10. Study on Chinese -English Unilateral Contrast of Monovalent Verbs,H314
  11. A Corpus Analysis of the Uses of SO in Chinese College Students’ English Compositions,H319
  12. Modern Chinese Function sentence and corpus construction,H146
  13. Corpus-based Sino-US economic discourse critical analysis,H052
  14. Corpus-based Analysis of Lexical Chunks in Chinese EFL Scholars’ English for Academic Purpose,H315
  15. A Corpus-Based Study of "Lexical Chunks" in Chinese English Majors’ Spoken English,H319
  16. A Comparative Study of Semantic Prosodies of Near Synonyms Provide and Supply,H319
  17. The Exploration of the Chinese New Words and Expressions in the 5th Edition of "Contemporary Chinese Dictionary",H136
  18. From Paper to Electronic: How Technology Will Shape the Future Dictionary,H06
  19. A Corpus Study on HSK First-Degree Psych Verb Collocations in Chinese/Non-Chinese Cultural Circles,H195
  20. The Study of Semantic Prosody in the Military Power of the People’s Republic of China Based on Corpus,H315
  21. The Study of Idiomatic Phrase in Teaching Chinese as a Foreign Language,H195

CLC: > Language, writing > China 's minority languages > Tibetan language
© 2012 www.DissertationTopic.Net  Mobile