Dissertation > Excellent graduate degree dissertation topics show

Term Weight-Based Chinese Text Classification Algorithm

Author: HouYanZuo
Tutor: ShenXiTing
School: Hebei University of Technology
Course: Applied Computer Technology
Keywords: Information retrieval text classification vector space model feature extraction feature weighting
CLC: TP391.1
Type: Master's thesis
Year: 2011
Downloads: 22
Quote: 1
Read: Download Dissertation

Abstract


With the rapid development of the Internet, in particular the Internet popularity, number of pages soared. So how in the vast information resources quickly and efficiently find the information they need to become a research focus. Most of the content of Web pages are text messages, so how to text message a web page automatic categorization become an important research subject. Automatic text classification is an essential first step in information retrieval, it refers to the classification of a given system, according to the text content automatically determines the process of text types in order to facilitate information retrieval. Through the classification system, information can be an effective organization and management, is conducive to rapid and accurate positioning information.This paper introduces the automatic text categorization at home and abroad of the status followed by the text automatic classification involved in key technologies, including information retrieval model, Chinese word segmentation, feature extraction, feature weighting methods and the critical classification algorithm were carried out research and exploration; re-entry in the feature weight, we analyzed the characteristics of items of traditional weight disadvantage, through the weights for the characteristics of commonly used TF-IDF method of analysis, an improved method of weight calculation. The weight calculation method to the characteristic features of the right to assess the function included in the calculation, in accordance with the characteristics of text categorization ability to distinguish right to adjust its weight in the calculation of contributions. Empowerment in character, made with TF-IDF weighting and x2 statistics calculation. Experiments show that the weight calculation method improved the classification accuracy has increased.Finally, this paper introduces the vector space model based on Chinese text categorization system, the overall framework, the system processes and function modules; Finally, the classification system implemented in a variety of feature extraction algorithm, the weight algorithm and classification algorithm were experimentally compared.

Related Dissertations

  1. Research on Automatic Detection Algorithm for Substructure Distress of Highway Pavement Based on SVM,U418.6
  2. ISAR Imaging Simulation of Space Targets and Target Recognition Based on ISAR Images,TN957.52
  3. Research on Feature Extraction and Classification of Pulse Waveform for Cholecystitis and Nephrotic Syndrome Diagnosis,TP391.41
  4. Application of Q-Learning in the Content-Based Image Retrieval Technology,TP391.41
  5. Research on Transductive Support Vector Machine and Its Application in Image Retrieval,TP391.41
  6. Research on Feature Extraction and Classification of Tongue Shape and Tooth-Marked Tongue in TCM Tongue Diagnosis,TP391.41
  7. Establishment and Update of Similar Users’ Cluster in Personalized Information Retrieval,TP391.3
  8. Research on Text Classification Based on Biomimetic Pattern Recongnition,TP391.1
  9. Tourism Comments on the Internet’s Semantic Analysis and Usefulness Research,TP391.1
  10. Research on Visual Measurement for Spacecraft Rendezvous and Approach,TP391.41
  11. Research on the Image Real-Time Acquisition, Storage and Image Processing System,TP391.41
  12. Feature Extraction, Selection and Combination in Lipreading,TP391.41
  13. Research on Query Expansion Technique of Retrieval System in Biomedical Field,TP391.3
  14. Multi-currency Notes Technology Research and Implementation,TP391.41
  15. The Research on Paper Currency Classification Method Based on Harr-Like Feature and Minimal Ball Including Samples,TP391.41
  16. Pavement Distress Recognition Based on Image,TP391.41
  17. Research on Visual Detection and Tracking of Mobile Robots,TP242.62
  18. Research on Fusion Algorithm of Hyper Spectral and High Spatial Resolution Remote Sensing Image,TP751
  19. An Approach for Identifying a Plant Resistance Gene Based on the Random Forest,Q943
  20. Tobacco Diseases Auto-Recognition Research Based on Image Processing Technology,S435.72
  21. Research on Nondestructive Detection Technology for External Qualities of Papayas Based-on Vision,S667.9

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing
© 2012 www.DissertationTopic.Net  Mobile