Dissertation > Excellent graduate degree dissertation topics show

The Research of Text Preprocessing Based on Web Mining and Itsapplication

Author: ZhongPeiRong
Tutor: ZhuHaiBin;WangRuLong
School: Hunan University
Course: Software Engineering
Keywords: Web Mining Text Mining Vector space model Feature selection Text Classification
CLC: TP391.1
Type: Master's thesis
Year: 2006
Downloads: 383
Quote: 3
Read: Download Dissertation


With the development of Internet technology , especially the global popularity of the Web , there the information is extremely rich in knowledge but the plaque relative lack of phenomenon . How from these vast the text found valuable information is an important goal in the field of information processing . Web text mining is a data mining technology is an important application in the network information processing , but due to the openness of the Internet and the heterogeneity makes it hard to quickly and accurately obtain the required information from the WWW , from a large number of information quickly and effectively extract the required information will become an important research topic . Text preprocessing is the bottleneck of text classification . This thesis is engaged in text mining the pretreatment work , as well as the basis of the work carried out by the text classification technology research has important theoretical significance and practical value . This paper first discusses the relevant theory of Web mining , and then discuss the Web text mining before pretreatment technology , the proposed pretreatment process improvement method : ( 1 ) in the feature weight computing : common feature weight weight analysis and comparison of the algorithm to improve the traditional TF × IDF algorithm : instead of using the the new Gini index evaluation function of the IDF . The experiments show that the improved method is superior to other methods in improving the accuracy and efficiency of mining . (2) In the method of the classifier : in order to highlight the right weight adjustment for the classification of the impact of the new weighting function joined Bayesian classification method for weight adjustment . To compare the experimental results , which proved to take into account the classification characteristics , the method has certain advantages . ( 3 ) Analysis and design of a mining system in one e-mail e-mail mining module .

Related Dissertations

  1. Research and Implementation of Mining Implicit User Interest,TP311.13
  2. Feature Extraction, Selection and Combination in Lipreading,TP391.41
  3. Based on Data Distribution Characteristics of Text Classification,TP391.1
  4. Research on Face Recognition Based on AdaBoost Algorithm,TP391.41
  5. Research of Web Text Classification Based on Decision Tree Classification Algorithm,TP391.1
  6. The Research and Application of Support Vector Machine in Intrusion Detection System,TP393.08
  7. The Research on Public Opinion Mining and Group Behavior Analysis on the Internet,F49
  8. SAR Images Segmentation Based on Quantum Evolution Feature Selection Algorithm,TN957.52
  9. The Research of Intrusion Detection Based on Feature Selection,TP393.08
  10. Misuse Intrusion Detection Based on Weighted Feature Selection,TP393.08
  11. Research of Key Technology of Anomaly Network Intrusion Detection Based on SVM,TP393.08
  12. The Research and Implementation of Data Mining Technology in the Electronic Bulletin Board System Environment,TP311.13
  13. The Research of FSVM Intrusion Detection Algorithom Based on Inverted Binary Tree,TP393.08
  14. The Research and Development of Personalized Learning System,TP391.6
  15. Research on Intrusion Detection Based on Feature Selection and Clustering,TP393.08
  16. Research on Text Classification Based on Biomimetic Pattern Recongnition,TP391.1
  17. Research on Modeling User Webpage Browsing Interest,TP393.092
  18. Audit System of Enterprise Site Service Based on Content Filtering Technology,TP393.08
  19. Web Content Mining and Clustering Based on Intention,TP393.092
  20. Research on the Algorithm for Chinese Duplicated Web Pages Detection,TP393.092
  21. Feature selection method in diagnosis of erythema squamous skin diseases in,TP181

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing
© 2012 www.DissertationTopic.Net  Mobile