Dissertation > Excellent graduate degree dissertation topics show

Research on Several Models in Text Classification and Clustering

Author: HeShiZhu
Tutor: WangMingWen
School: Jiangxi Normal University
Course: Computer Science and Technology
Keywords: Text mining Large-scale text categorization Deep Classification Text clustering Markov Network
CLC: TP391.1
Type: Master's thesis
Year: 2011
Downloads: 29
Quote: 0
Read: Download Dissertation

Abstract


With the rapid and continous growth of text data in the Internet, text mining as an effective tool in organizing and managing large mount of text data has been studied intensively and applied widely. Some improved methods aim at solving the problems of text classification and text clustering in the field of text mining have been proposed in the paper.As for the supervised learning problem in text classification, traditional classification methods are good at categorizing the documents into a few categories. However, classification on a large-scale hierarchy is a challenge task for many categories with cross-link relationships.“Deep classification”method is an effective framework for the problem and makes the problem tractable, it consists of two stages: search stage and classification stage, the search phase is used to select a number of candidate categories for a given testing document, classification phase is used to fix final category based on a more accurate classifier with those category candidates. We proposed an improved deep classification model, first, a new method to evaluate of the effect of search stage being proposed, second, we select category candidates based on category and document information, at last, we training centroid-based classifier-Rocchio, which utilize the information of related categories, such as top category, parent categories, sibling categories and subclasses.In the field of the unsupervised learning problem in text clustering, it is important to calculate of correlation among documents accurately and efficiently. A common method is to calculation the statistical correlation between the document vectors directly; but it does not take the adjacency of the documents into account. In this paper, we proposed a new method based on Markov Network model, which take not only the direct statistical information but neighborhood information into account of computing its correlation. We build a Markov Network and weighted combine the transfer matrix of each step, which increasing the description of correlation between within-class data and expanding the gap between inter-class data; finally, we clustering documents by the description of correlation whose gap is obvious.Our primary works are as follow.1) An advanced classification model has been proposed after systematically study on the methods and applications for large-scale text classification. A series of experiment show that relevant categories, especially the top-level and sibling categories, have a good rold in determine the target class.2) We represent the text data set based on Markov network model, and describe the correlation of documents by weighted combine the transfer matrix of each step, at last we clustering based on the description. A series of experiment show that the method of weighted combine the transfer matrix of each step can be well improve the clustering effect in text clustering.

Related Dissertations

  1. Research and Implementation of Mining Implicit User Interest,TP311.13
  2. Evolutionary Clustering Algorithm and Its Application,TP311.13
  3. Research of Text Clustering on Food Complaint Documents Based on Ontology,TP391.1
  4. The Study of Topic-Oriented IT News with Search Enging and Web Page Analysing,TP393.092
  5. The Design and Implementation of the Hot Education News Topic Detection System,TP391.1
  6. Chinese Movies in the Vision of American Movies Critics,J905
  7. Text -oriented disciplines correlation analysis association rule mining technology research,TP311.13
  8. Markov Retrieval Model Based on Transferring Learning,TP391.3
  9. Extended Information Retrieval Model Based on Markov Cliques,TP391.3
  10. Study on the Diversity of Insect Community and the Seasonal Dynamics in the Neighborhood System of Paddy with Pomegranate,S186
  11. Face cartoon based on machine learning method,TP391.41
  12. Research on Combining Collective Classification with Active Learning,TP181
  13. Text based on SVM multi-class classification,TP391.1
  14. The literature of resources for research and application clustering system,TP391.1
  15. Research on the Scale Free Graph k-medoids Cluster Algorithm,TP301.6
  16. The Research on Graph Structure Representation Method Based Chinese Text Clustering,TP391.1
  17. Research on Thesis Text Clustering Based on Semantic Similarity,TP391.1
  18. Research on Document Clustering Technology Based on Latent Semantic Indexing,TP391.1
  19. Study on Similarity-based Text Clustering Algorithm and It’s Application,TP301.6
  20. Research on Enterprise Competitive Intelligence Collection System Based on Web Text Mining,TP311.52

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing
© 2012 www.DissertationTopic.Net  Mobile