Dissertation > Excellent graduate degree dissertation topics show

Research on Web Chinese Text Automatic Categorization Based on RS-SVM

Author: RongJianWen
Tutor: MaGang
School: Dongbei University of Finance
Course: E-commerce
Keywords: Text Categorization Rough Set Attribute Reduction SVM Binary decision tree
CLC: TP391.1
Type: Master's thesis
Year: 2010
Downloads: 34
Quote: 0
Read: Download Dissertation

Abstract


With the application and popularization of information technology, particularly the rapid development of Internet technology, information is growing explosively, and filling every aspect of our lives, people in daily life always need to obtain information, analyze information, using information. How effectively to mine the beneficial information which we need and interested in from the intricate information, becomes a problem in the area of computer application. And text categorization is an important means for data mining, so this paper does further research in this field.Firstly, this thesis particularly introduced the related technology of text classification according to the procedure of Web Chinese text classification, particularly researched and analyzed the key technologies of text classification including text pre-treatment, text representation, text dimension pre-reduction, text classification methods and so on. Secondly this thesis systematically elaborated the basic theory of the rough sets and support vector machine. In order to improve the classification performance of the system and to reduce the classification running time, this paper puts forward a new kind of text classification algorithm based on the combination of rough sets and support vector machine (SVM). It uses knowledge reduction algorithm of rough sets to reduce the dimension of pre-treated data, to reduce and delete the redundant attributes. It also uses the generalization ability and classification ability of support vector machine to train the test data, thus to achieve the purpose of complementary advantages. When the thesis was expatiating the rough set theory, particularly introduced its core theory of knowledge reduction algorithm, and proposed an improved heuristic attribute reduction algorithm, in order to improve rough set theory’s ability of dimension reduction and greatly to reduce the dimension of the text. When the thesis was introducing the basic concepts of support vector machine, it was focused on two classification algorithm and multi-classification algorithm. For two classification algorithm, based on the pre-researcher’s results, this thesis proposed a modified SVM two classification algorithm of combination of kernel function. For multi- classification algorithm, based on the comparison of "one-vs-rest", "one-vs-one", decision directed acyclic graph and binary decision tree algorithm, it proposed an ameliorative layer of clustering center distance binary decision tree SVMs multi-classification algorithm. Finally designed and realized the Web Chinese text classification system based on improved RS-SVM algorithm, and used it to classify the Web Chinese texts which were searched on the Internet. The results verify the superiority of the improved algorithm on area of the Web Chinese text automatic categorization.

Related Dissertations

  1. Soft Sensor of Naphtha Dry Point on Support Vector Machines Regression,TE622.1
  2. The Research of the Fault Diagnoses Algorithm for the Liquid Rocket Engine Testing Bed Based on PCA-SVM,V433.9
  3. ISAR Imaging Simulation of Space Targets and Target Recognition Based on ISAR Images,TN957.52
  4. Research on Autamatic Music Structrue Analysis,TN912.3
  5. Research on Feature Extraction and Classification of Pulse Waveform for Cholecystitis and Nephrotic Syndrome Diagnosis,TP391.41
  6. Research on Classification Method of Tongue Substance Color and Tongue Coating Color Based on SVM,TP391.41
  7. The Research on Paper Currency Classification Method Based on Harr-Like Feature and Minimal Ball Including Samples,TP391.41
  8. Fault Diagnosis Method Based on Support Vector Machine,TP18
  9. Research on Focused Crawler Based on SVM Classification Algorithm,TP391.3
  10. Research on Predicting Intrinsic Disorder Protein Structure Based on Supervision Manifold Learning Algorithm,Q51
  11. Research on Clustering Algorithm Based on Genetic Algorithm and Rough Set Theory,TP18
  12. Study on the Road Condition Monitoring Based on Vehicular 3D Acceleration Sensor,TP274
  13. Research of Orange Quality Classification Technology Based on Computer Vision,TP391.41
  14. Analysis on Synoptic Climatology Characteristics and Forecast Methods of Fog in Hainan,P457
  15. Research of License Plate Recognition Based on Rough Sets and Fuzzy SVM,TP391.41
  16. Study on Visual Target Detection Based on SVM,TP391.41
  17. The Research on Electrode 3D Model Classification and Retrieval Based on SVM and Shape Features,TP391.41
  18. Multi-step-ahead Stock Price Index Forecasting Based on Hybrid Models,F224
  19. Research on ECG Feature Extraction and Classification Method,TN911.7
  20. Multi-feature fusion of visual tracking algorithm,TP391.41
  21. Doppler weather radar based windshear prediction,P415.2

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing
© 2012 www.DissertationTopic.Net  Mobile