Dissertation > Excellent graduate degree dissertation topics show

The Study on Feature Selection Algorithm in Chinese Text Clustering

Author: GongJing
Tutor: ZhouJingYe
School: Xiangtan University
Course: Applied Computer Technology
Keywords: Chinese text Text Clustering Feature selection Vector Space Model (VSM)
CLC: TP391.1
Type: Master's thesis
Year: 2006
Downloads: 312
Quote: 4
Read: Download Dissertation

Abstract


In recent years, we can easily get an alarming number of text documents from the Internet, digital libraries , news agencies and the company intranet . So , people development can help users to effectively navigate , summarize and organize these the text information technology growing interest . Achieve this goal , the fast and high - quality text clustering technology plays an important role . By organizing large amounts of information into a small number of meaningful clusters , this technology can provide navigation / browsing mechanism , or driven by clustering dimensionality reduction or weight adjustment to greatly improve the retrieval performance . Therefore , the text clustering study to become an important subject of the current international information processing , domestic Chinese text clustering research is in its infancy , there are still many problems to be solved . In this paper, this study, specific work is as follows : First, we have made ??some improvements for existing words weight calculation method to take into account not only the words in the text probability information , and also combine text and semantic information is proposed a weighted based on multiple factors the words weight calculation method . Experiments show that this method can improve the correct rate of text clustering . Then, summarizes the deficiencies of the existing feature item selection method , which made a word contribution (TD) of the feature selection method . The test proved that the correct rate of the text clustering This feature selection method can be improved , thereby improving the overall performance of the cluster , to achieve effective dimension reduction purposes . Secondly, we study the text clustering algorithm , k - means algorithm is a simple and efficient text clustering algorithm , but it exists because of the initial cluster centers will choose not to fall into local minimum , the resulting solution is a local optimum optimal solution , rather than global optimal solution . To this end, we propose an improved k-means algorithm , the algorithm can improve the stability of the clustering and to improve the clustering results . Finally, we conducted in Chapter a sequence comparison of the experimental .

Related Dissertations

  1. Research on Text Classification Based on Biomimetic Pattern Recongnition,TP391.1
  2. Feature Extraction, Selection and Combination in Lipreading,TP391.41
  3. Research on Feature Selection and Construction in Emotion Speech Recognition,TP18
  4. Evolutionary Clustering Algorithm and Its Application,TP311.13
  5. Research on Intrusion Detection Based on Feature Selection,TP393.08
  6. Based on Data Distribution Characteristics of Text Classification,TP391.1
  7. The Research on Feature Selection for Data Stream,TP311.13
  8. Research on Face Recognition Based on AdaBoost Algorithm,TP391.41
  9. Research on Feature Extraction, Selection and Classification Algorithms for Pulmonary CAD,TP391.41
  10. Face Recognition Based on Near-Infrared Images,TP391.41
  11. Study on Anaphora in the Texts of Bilingual Students in Urumqi Middle School,H102
  12. Research on Feature-based Semantic Relation Extraction between Entities,TP391.1
  13. A Study on Generalization Ability in Chinese Teaching,G633.3
  14. FSVM -based data mining method and its application to intrusion detection research,TP393.08
  15. Ontology-based document management system BIM environment research,TP391.1
  16. A Conceptual Query Based Multi-Document Summarization in Biomedical Domain,TP391.1
  17. Predicting Protein Protein Interactions and Its Active Sites Based on Data Mining Algorithm,TP311.13
  18. Classification based on gene expression profiles of tumor,R730.2
  19. The Research and Application of Feature Selection Algorithms in Mass Spectrometry Based Metabolomics Data,TP311.13
  20. Chinese Text Classification Algorithm,TP391.1
  21. The Research of P2P Traffic Classification Based on Machine Learning Algorithms,TP393.02

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing
© 2012 www.DissertationTopic.Net  Mobile