Dissertation > Excellent graduate degree dissertation topics show

Based on Latent Semantic Indexing Research and Implementation of Text Categorization

Author: SuXianYu
Tutor: ZhangTianWen
School: Harbin Institute of Technology
Course: Computer Science and Technology
Keywords: Text Classification Latent Semantic Indexing Partial Least Squares Regression
CLC: TP391.1
Type: Master's thesis
Year: 2008
Downloads: 72
Quote: 0
Read: Download Dissertation

Abstract


Latent Semantic Indexing model (Latent Semantic Indexing, LSI) is experimentally validated text classification techniques of effective dimensionality reduction algorithm. Latent Semantic Indexing model of the original feature space dimension reduction process is a dimensionality reduction while preserving the original features as possible the process of global information space , then this process will inevitably filter out certain categories of recognition is very important, but consider the overall situation is not very important feature . Therefore, for the above-mentioned problems, we conducted a traditional LSI model improvements. First, in the weight calculation based on word frequency based on the defects of traditional methods , this paper presents the calculation process concept document weights so that the new weight calculation method is more conducive to the formation of latent semantic space , more suitable latent Semantic Indexing model ; while increasing the word position information , making the words weight calculation more accurate. Then, in the traditional χ2 statistical methods based on the analysis , the traditional χ2 statistical method for rare category of information do not pay attention and for the particular case χ2 statistic error is too high and other defects, we introduce the frequency, concentration, dispersion three indicators , so that the new method is more accurate χ2 statistics . Finally, the paper in the traditional classification methods based on LSI increases the categories of information to consider the use of partial least squares regression proposed new text classification method , called latent semantic information based on category classification method (Latent Semantic Classification based on Category Information , LSCCI). This paper describes in detail the implementation of latent semantic indexing model principle, LSCCI derivation process was elaborated and LSCCI with other classical classification performance of the model were compared . Experimental data show that , LSCCI has better classification accuracy . In the English text classification experiments demonstrated for rare category classification model is more excellent than conventional classification performance .

Related Dissertations

  1. Research on Text Classification Based on Biomimetic Pattern Recongnition,TP391.1
  2. Tourism Comments on the Internet’s Semantic Analysis and Usefulness Research,TP391.1
  3. Based on Data Distribution Characteristics of Text Classification,TP391.1
  4. Research on Improved K Neighbor Support Vector Machine Algorithm Faced Text Classification,TP391.1
  5. Research for Event Extraction Method in Specific Domain Based on Tree Conditional Random Field,TP391.1
  6. Online Education News Text Categorization System Design and Implementation,TP391.1
  7. One kind of empirical data on the workload of a software bug fixes Prediction Model,TP311.53
  8. Research on cross-language text categorization,TP391.1
  9. Classification model based monitoring of e-commerce Prohibited Research and Implementation,TP393.09
  10. Based on semantic analysis of text mining research,TP391.1
  11. Based on the associated technology Chinese Text Classification,TP391.1
  12. Application of Partial Least Square and Discrimnent Analysis in Studying the Style and Influence Factors of the Scientific Personnel,G644
  13. Theoretical and Experimental Studies on the Measurement of COD in Water Using Ultraviolet Spectrum Method,X832
  14. Partial Least Squares Prediction of Silicon Content in Blast Furnace,TF325.6
  15. Analysis for Intermittent Fixed-Bed Coal Gasification,TQ546
  16. ASU purification system Energy Control System,TQ116.11
  17. Research of Text Clustering Based on Genetic Algorithm,TP391.1
  18. The Study of Text Classification Based on Support Vector Machine,TP391.1
  19. Objectionable Information Filtering System Based on ATN Algorithm and Latent Semantic Indexing,TP391.1
  20. Weighted fusion methods and sparse partial least squares method comparison,O212.1
  21. Semantic content text classification feature extraction algorithm,TP391.1

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing
© 2012 www.DissertationTopic.Net  Mobile