Dissertation > Excellent graduate degree dissertation topics show

Research on Text Stream Classification by Keywords

Author: YangBaoGuo
Tutor: ZhangYang
School: Northwest University of Science and Technology
Course: Computer Software and Theory
Keywords: text stream classification unlabeled documents concept drift classifier ensemble knowledge acquisition
CLC: TP391.1
Type: Master's thesis
Year: 2011
Downloads: 41
Quote: 0
Read: Download Dissertation

Abstract


Traditional data stream classification usually requires a great number of fully labeled training examples to build classifiers, which is expensive and time consuming. However, in real life, the data streams are basically unlabeled, which makes the traditional data stream methods impractical. To address this problem, in recent years, research based on semi-supervised data stream classification methods has been increasingly concerned. Some researchers proposed to use partly labeled examples or only small positive examples and large amount of unlabeled examples for data stream classification. Although these approaches have reduced the cost of manual labeling, it still requires users to label some samples.To further release the burden of manual labeling, as for the text data stream classification, this paper proposed a novel approach, which uses keywords to classify text streams without manual labeling. First of all, the base classifier is built by keywords and unlabeled documents, then the documents in text stream are classified by ensemble based algorithm. In the classifier construction phase, keywords are semantically expanded and then used to label the initial positive documents. At the classification stage, the final label of unknown document is predicted by the weighted majority voting algorithm.In this paper, the concept drift in the text stream is also intensively studied. Concept drift arisen by the change of user’s interests is mainly explored in this work, and the keywords provided by the user determine the user’s current interests and the target concepts. Therefore, when the user’s interest changes, the concept drift will occur as well. This paper also simulates the common concept drift scenarios, namely, the gradual concept drift and abrupt concept shift. Furthermore, a comparative analysis is also conducted between the concept drift scenarios and the non-drift scenario.Experimental results demonstrate that the proposed method can build an excellent classifier by keywords without using any manual labeled examples, which can achieve comparable results compared with the PU learning method building classifiers by labeled positive and unlabeled documents. Moreover, the classifier ensemble method used in this paper can quickly capture and adapt to the concept drift in the text streams. Experiment results also show that the ensemble based algorithm performs better than single window based algorithm. The method proposed in this paper for text stream classification does not require manual labeled documents, which will be more practical for real-life applications.

Related Dissertations

  1. Based on Rough Set of Urban Areas When Traffic Green Control System Research,TP18
  2. Incomplete information on the completeness of the system and its knowledge acquisition,TP311.13
  3. The task of the product design process modeling and knowledge acquisition mechanism,TB472
  4. The Design and Development of Remote Diagnositic Center Based on Kowledge Service for Marine Power System,TP277
  5. Research of Recommendation Algorithms Based on Collaborative Filtering,TP301.6
  6. Based on the body 's production safety accident case-based reasoning system,X928.0
  7. The Study of Drilling Trolley Hydraulic System and Trouble Diagnose,TD421.24
  8. Design and Implementation of Armored Vehicles Fault Diagnosis Assistant System Based on the Decision Tree,TJ811
  9. Study and Implementation on Data Stream Online Classification Algorithm,TP311.13
  10. Machine Learning Algorithm-based Metaphor Recognition,TP181
  11. Research on the Relationship between Organizational External Learning Style and Technology Innovation,F124.3;F224
  12. An Empirical Study of the Effects of Frame Knowledge on the Acquisition of Culturally-loaded Words,H319
  13. Study on the Evaluation of Museum Interpretation Based on Visitors’ Perception,G266
  14. Classifier ensemble -based data stream classification techniques,TP311.13
  15. On Thematic Preparation for Interpretation in Light of Daniel Gile’s Sequential Model of Translation,H059
  16. Design and Development of Early Warning and Emergency Dispatching System for Power Network,TM734
  17. Research on Abnormal Decision Support System Knowledge Acquisition and Knowledge Base Construction,TP181
  18. Online network intrusion detection research based on the data stream classification,TP393.08
  19. Acquisition of Knowledge Patterns from Large-Scale Chinese Corpora,TP391.1
  20. Research in Collaborative Management and Decision System for Crops Based on Knowledge Grid,S126

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing
© 2012 www.DissertationTopic.Net  Mobile