Dissertation > Excellent graduate degree dissertation topics show

A Comparison Study of Chinese Text Categorization

Author: GaoZuo
Tutor: LiuDaZhong
School: Hebei University
Course: Applied Computer Technology
Keywords: Text categorization Naive Bayes (NB) K-Nearest Neighbor (KNN) Support Vector Machines(SVM)
CLC: TP391.1
Type: Master's thesis
Year: 2008
Downloads: 73
Quote: 0
Read: Download Dissertation


The information of the text is increasing rapidly along with the fast development of Internet. As a result, automatic text categorization has become more and more important. Text categorization is one of the essential issues on how to deal with natural language. Text categorization is one of the focuses of information techniques on how to utilize this rich data resource. The paper contains the following two parts:Firstly, we present a new method of text representation, which is based on the contextual matrix of classified core words in this paper. One hand, a core word can be the representative characteristic appeared in the title, abstract, keywords or front and finality section of a text. A classified core word can represent the characteristic of a text than other keywords better, but it can not convey the relationship of the context. Toward the above disadvantages, a contextual matrix of classified core words is provided. The words are ordered in different position of the matrix according to the different information they contain towards classified core words. So the relationship between the context and classified core words is constructed better. The other hand, we compute the weight newly according to the distinct position of the words in the text and the classified core words in the matrix. Then, the text can be represented more efficiently.Secondly, the basic process and principles of Chinese text categorization are presented. And three widely applied methods, such as Naive Bayes (NB), K-Nearest Neighbor (KNN) and Support Vector Machines(SVM), are discussed and analyzed by comparison in this paper.

Related Dissertations

  1. Research on Predicting Intrinsic Disorder Protein Structure Based on Supervision Manifold Learning Algorithm,Q51
  2. Text Categorization Based on Rough Set Theory,TP18
  3. Multi-step-ahead Stock Price Index Forecasting Based on Hybrid Models,F224
  4. Network public opinion analysis to key technology research and,TP393.09
  5. The Implementation and Research of the Probabilistic Latent Semantic Analysis Model in the Search Engine’s Business Text Classification System,TP391.1
  6. Emotion Recognition Methods in Intelligent Teaching,TP391.41
  7. Content-based spam filtering technology research,TP393.098
  8. Internet News Hot Mining System Research and Implementation,TP393.09
  9. Study on Chinese Text Categorization,TP391.1
  10. Study of Image Feature Extraction and Texture Classification Algorithm,TP391.41
  11. Research and Application of News Automatic Classification Technology Based on Support Vector Machines,TP391.1
  12. The application of data mining in the quality management of the H08 small electronic transformer,TP311.13
  13. Research on Web Chinese Text Automatic Categorization Based on RS-SVM,TP391.1
  14. Research and Improvement to Text Classification Algorithm,TP391.1
  15. Design and Implementation of a Lucene Based Intra-site Information Retrieval System for a Journal Site,TP391.3
  16. A Technology of Text Categorization on Imbalanced Datasets,TP391.1
  17. Research and Realization on Correlation Techniques of Topic Search-Specific Engine,TP391.3
  18. Study on Graduation Thesis Pre-Inspection Management System,G311
  19. Studies on Feature Selection Method Based on Heuristic Attribute Reduction of Rough Set,TP18
  20. Research on Feature Selection and Classification Methods for Text Categorization,TP391.1
  21. Research on the Topical Search Engine Based on Semantic,TP391.3

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing
© 2012 www.DissertationTopic.Net  Mobile