Dissertation > Excellent graduate degree dissertation topics show

Research on Fitlteration and Classfication Methods of Large-Scale Short Text

Author: WuZuo
Tutor: GuoJun
School: Beijing University of Posts and Telecommunications
Course: Signal and Information Processing
Keywords: short text text filtration Regular Expression statistic language model text classification
CLC: TP391.1
Type: Master's thesis
Year: 2007
Downloads: 291
Quote: 9
Read: Download Dissertation

Abstract


The instant communication technology is greatly boosted in the current information society. The Short Message Service (SMS), which is used for mobile telephone is considerate as another big information carrier besides the Internet. It is used in every aspect of the society and people’s life. Used as communication tool, the Short Message also plays a critical role in the guide and spread of public opinion. Therefore, analysis and research of the short message which is a special kind of short text, building up effective and exact classification system, excavating the user interested information are especially important and urgent. Based on this background, the thesis launches the investigation and research on filtration and classification methods of short text.Currently, the traditional text disposing methods has grown mature and can filter and classify the regular text. However, as for the short message which uses the short text as carrier, the research is just at the underway stage. Therefore, with the background of project2, the thesis has done many researches about the features and related disposing method of short text, and then puts forward the rule-based filtering method and statistic language model-based classification method, which is meaningful at both research and realism. The mainly contributions that come out of the thesis are:First of all, on the basis of investigation about language feature and corpus structure, along with the project’s background, pointing out the rule-based method to filter large scale given short text. The thesis uses the Regular Expression as tool to create rule and finish matching. The aim of this is to guarantee the fast and exact matching of the mean less short texts with fixed format and expression mode, and then filter them.Secondly, research and establish the classification system of the short text. After studying the principle and smoothing algorithm of statistical language model, the thesis brings out the language model based modeling method for short text. The classifier based on the statistical language model can dispose the non-handwriting short text. In order to solve the problem comes of a short text contains little info, topic feathers is combined with language model, which can derive a more accurate language model for short text.This thesis systematically introduces the language features and classification characteristics of short text, and then brings out effective filtering and classification methods aimed at disposing large scale short text. However, the technology used for short text is relatively immature compared to those used for traditional long regular text, and great performance improvement is still possible for further research in short text disposal.

Related Dissertations

  1. Research on Text Classification Based on Biomimetic Pattern Recongnition,TP391.1
  2. Web Data Extraction Technology and Application,TP311.13
  3. The Research of Text Classification Based on Hadoop,TP391.1
  4. Term Weight-Based Chinese Text Classification Algorithm,TP391.1
  5. One kind of empirical data on the workload of a software bug fixes Prediction Model,TP311.53
  6. Long text auxiliary short text clustering method of knowledge transfer,TP391.1
  7. Based on semantic analysis of text mining research,TP391.1
  8. Research on Application of Fuzzy Theory in Text Classification,O159
  9. Improving Performance of Deep Packet Inspection Based on Pattern Matching,TP393.08
  10. Studies of Query Expansion Based on Semantic Dictionary and Local Analysis,TP391.3
  11. Research on Automatic Classification Methods of Enterprise Business Scope,TP391.1
  12. Research on Ontology-based Short-text Classification,TP391.1
  13. Text-oriented classification method of feature word selection,TP181
  14. BBS Short Text Clustering Technology,TP393.094
  15. Research of Improvement to the Density-based Method for Reducing the Amount of Training Data and Application to kNN,TP391.1
  16. Research on Automatic Construction of Knowledge Tree in Knowledge Management System,G301
  17. Design and Implementation of FPGA-Based Regular Expressions,TN791
  18. Research and Realization Based on Subjective Objective Text Classification Preprocessing Methods,TP391.1
  19. Research and Application of Patent Literature Classification and Association Recommendation Technology,TP391.1
  20. The Research and Implementation of an Semantic-based Automatic Chinese Text Categorization System,TP391.1
  21. Blog retrieval key technology research,TP391.3

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing
© 2012 www.DissertationTopic.Net  Mobile