Dissertation > Excellent graduate degree dissertation topics show

The Analysis on the Basic Techniques for Preprocess of Text Mining and the Study on the Application of Text Mining

Author: SunDaoJun
Tutor: LvTingJie
School: Beijing University of Posts and Telecommunications
Course: Management Science and Engineering
Keywords: Chinese word segmentation Vector Space Machine(VSM) K nearest neighbor (KNN) Text Mining
CLC: TP311.13
Type: PhD thesis
Year: 2008
Downloads: 720
Quote: 2
Read: Download Dissertation

Abstract


The general workflow of text mining has been systematically explained and implemented in this thesis. The key techniques used in text mining including collecting text, preprocess of text, automatic Chinese word segmentation for the processed documents ,selecting training pattern and reducing support vectors, text training and text mining. We divide the system into four parts based on analysis of the system’s requirement: text collecting and preprocess, Chinese word segmentation, selecting training pattern vector and the training and classification of the text patterns vector.Unlike the general text mining, we need to collect test, preprocess these text and save the weight of the text. We implement a preemptive multi-thread web text collector. It collects the text of special catalog using Depth First Algorithm. And we implement a text preprocessor to erase the Tag and set the weight for the web Text by using recursive match method. On the other parts, we first introduce a classifier using the nexus between words and type to properly select training pattern and to reduce support vectors. And then we introduce the basic theory about K nearest neighbor (KNN) , the application of KNN in text classification and the software KNN. The extracted patterns and their weight are used to form the input file, through which we can implement text training and text classification.The author implement the text collector and preprocessor and the Chinese word segmentation machine for text mining, propose a new solution for selecting the text patterns and text mining based on our study.

Related Dissertations

  1. The Study of Topic-Oriented IT News with Search Enging and Web Page Analysing,TP393.092
  2. Research and Implement of Chinese Word Segment Techniques Based on the Conditional Random Field,TP391.1
  3. Research on Approaches of the Subjective Automated Assessment,TP391.1
  4. Corporate e-mail monitoring system design and implementation,TP393.098
  5. Optimization of SOM Algorithm and Application in Chinese Text Clustering,TP391.1
  6. A Study on the Application of Personalized Information Retrieval System Based on Ontology,F49
  7. Research and Implementation of an Information Pre-process Platform of Public Opinion,TP393.09
  8. Chinese Segmentation Algorithm Research Based on Special Identifier,TP391.1
  9. KNNModel algorithm and its application,TP311.13
  10. Research on Technology in Identification of Aerial Targets Based on Support Vector Machine,TN953
  11. Study of Chinese Text Similarity Based on Number Difference Gene,TP391.1
  12. Research on Ontology Technology’s Applications in Cooperative Learning Interactive Information Processing,C931
  13. Research on Co-clustering and Application,TP311.13
  14. Research on Sentiment Orientation Analysis of Blog Article Based on Blog Search,TP391.1
  15. A Text Categorization Method Based on Features Clustering,TP391.1
  16. Search engine for the field of art education applications,J20-4
  17. Text-based mining companies Distress,F426.82
  18. Research and Application of Internet Chinese Text Classification,TP391.1
  19. The Studies on Chinese Text Categorization Based on Pso and Svm,TP391.1
  20. Different syntax level of knowledge on Chinese word segmentation,B842
  21. The Research of Word Index Method Based on Inter-Relevant Successive Trees Model,TP391.3

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems
© 2012 www.DissertationTopic.Net  Mobile