Dissertation > Excellent graduate degree dissertation topics show

Researching on Chinese Text Classification Based on Naive Bayes and BP Neural Network

Author: WangYa
Tutor: XiaYouMing
School: Yunnan Normal University
Course: Computer Software and Theory
Keywords: Chinese text classification Feature selection BP neural network TF-IDF Chinese word segmentation Text Classification
CLC: TP391.1
Type: Master's thesis
Year: 2008
Downloads: 118
Quote: 0
Read: Download Dissertation


With the wide application of database technology and the rapid development of the network as well as database management systems, more and more people accumulated data. These information are text data, and want to get the information they need quickly and effectively from which is a very difficult thing. These vast amounts of data behind a lot of important information, people want to be able to be a higher level of analysis, in order to make better use of these data. To this end, the Chinese text automatic classification proposed by researchers and applied research, studying Chinese text classification has important theoretical significance and practical value: automatic classification is much better than on the speed and efficiency of the artificial classification, it can save a lot of human, material and financial resources; Automatic text classification can improve the Chinese search recall and precision rate, and can create automatic classification of information resources, to provide users with the help. Today, text categorization technology has been gradually and e-government, the search engine, information push, information filtering, information processing technology combine to improve the quality of information services to facilitate the people's work and life. This paper focuses on text classification techniques to expand the discussion, first introduce the background and significance of research topics, Overview Research for text classification technology at home and abroad, and then carry out a detailed exposition of the text classification technology, text classification of general process. Text, text segmentation, feature selection, feature reduction, classification algorithm and classification evaluation criteria used in the field of text classification techniques introduced and presented some thinking and insights. (1) describes the existing corpus and the establishment and maintenance of this system of corpora analysis of the basic structural features of the text and text information components contribute to the classification process, the Chinese word common method used in this article the framework of the structure and the various parts of CAS Institute of Computing ICTCLAS segmentation system is described. CHI method using improved as this article feature selection methods, and been elaborated. (2) on the basis of existing text vector of feature weights expressed, there is proposed an improved TF-IDF method, so that the degree of importance of the characteristic words in the different lengths of the document, which reflects the characteristics of the ability to distinguish between and the correctness of the method to be proved. (3) describes the Naive Bayesian classification and Naive Bayesian classification algorithm. Describes the basic features of the neural network, given an improved BP neural network method for text classification thinking, this method uses the VC dimension to determine the number of neurons in the hidden layer, thus improving BP hidden layer neural elements only by experience given problem. Compared with the traditional single hidden layer BP network, the article uses contains two hidden layer BP neural network, thus reducing network error. At the same time, the method is algorithmic description and analysis of algorithms. (4) on the basis of the above research, the use of Visual C # 2005 and the MS Access2000 development tools were part of the implementation work to build a Chinese text classification system CTCS (Chinese Text Classification System). Text classification as a technology of data mining With the development of database technology, more and more attention by the researchers, and applied research. Text classification techniques and e-government, the search engine, information push, information filtering, information processing technology combined to improve the quality of information services to facilitate the people's work and life. Firstly, data mining and text mining are outlined, including the status of data mining and text mining and text classification Research. Text processing stage, summarizes the main features of the Chinese text segmentation and methods, including commonly used method of segmentation and unknown word recognition, and gives the Chinese word current results and the limitations of the existing segmentation methods. Secondly, the characteristics of the text and feature selection methods studied, including a common method for Chinese text representation of commonly used methods and feature selection, document frequency, mutual information, and information to increase efficiency, the chi-square method, text evidence, cross entropy and odds ratio method are introduced and comparison. Then the principal component analysis, feature extraction and dimension reduction methods of latent semantic indexing, non-negative matrix factorization, vocabulary clustering made presentations and pointed out the advantages and disadvantages of each method. Again, the commonly used classification methods in the study of Chinese text classification. Bayesian classification method, KNN classification methods, decision tree classification method, the characteristics and shortcomings of the rough set classification method, SVM classification and genetic algorithms and neural network classification methods such as classification method is summarized in classification performance assessment methods. The end of the article gives a prospect for the future development direction of the text classification.

Related Dissertations

  1. Research on Feature Extraction and Classification of Tongue Shape and Tooth-Marked Tongue in TCM Tongue Diagnosis,TP391.41
  2. Research on Text Classification Based on Biomimetic Pattern Recongnition,TP391.1
  3. Tourism Comments on the Internet’s Semantic Analysis and Usefulness Research,TP391.1
  4. Feature Extraction, Selection and Combination in Lipreading,TP391.41
  5. Research on Visual Servo System of Mechanical ARM,TP242.6
  6. Municipal tourism land use planning environmental impact assessment,X820.3
  7. Study on Taste Characteristic of Taste Peptide Enzymatic Production from Oyster Base on A Neural Network Method,TS254.4
  8. Research on Feature Selection and Construction in Emotion Speech Recognition,TP18
  9. Optimization Study on Gating System and Molding Process Parameters of Injection Mold Based on Simulation,TQ320.662
  10. Research of Adaptive Active Noise Control Based on Neural Network,TP183
  11. Research on Automatic Reading System for Digital Meters,TP391.41
  12. The Research of Evaluation Method in Connect6 Based on BP-TD Learning,TP18
  13. Research on State Diagnosis on Fan Based on Factor Analysis and BP Neural Network,F426.61
  14. Analysis on Water Ecological Carrying Capacity of Jiangxi Province,TV213.4
  15. Research and Implement of Chinese Word Segment Techniques Based on the Conditional Random Field,TP391.1
  16. In-furnace Temperature Information Included Combustion Optimization of a Utility Boiler,TK227.1
  17. Research on Approaches of the Subjective Automated Assessment,TP391.1
  18. Study on Safety Comprehensive Assessment of Hydro-Power Construction Sites,TV513
  19. Based WebHarvest the Chinese financial news search engine design and implementation,TP311.52
  20. Chinese XML Compression Technology,TP311.11
  21. One kind of empirical data on the workload of a software bug fixes Prediction Model,TP311.53

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing
© 2012 www.DissertationTopic.Net  Mobile