Dissertation > Excellent graduate degree dissertation topics show

Research about Term Network Based Keywords Extraction Strategy

Author: ZuoZuoYi
Tutor: TangYan
School: Southwestern University
Course: Computer Software and Theory
Keywords: deleting actor co-occurrence keyword extraction term network betweenness centrality
CLC: TP391.1
Type: Master's thesis
Year: 2008
Downloads: 206
Quote: 1
Read: Download Dissertation

Abstract


With the advent of Internet since 1990, we have seen a tremendous growth in the volume of online text documents available on the Internet, such as electronic emails、web pages、and digital books et al. To make more effective use of these documents, there is increasingly need for tools to deal with text documents. To meet such increasingly needs, some product for analyzing text documents has been developed. All techniques involved in document analysis have formed a new exciting research area often called as Text Mining.Keywords extraction plays a very important role in the text mining domain, because keywords are useful for a variety of purposes, including summarizing, indexing, labeling, categorizing, clustering, highlighting, browsing, and searching. The task of automatic keyword extraction is to select keywords from the text of a given document. Automatic keywords extraction makes it feasible to generate keywords for the huge number of documents that do not have manually assigned keywords.There are some previous approaches on keywords extraction: 1 Supervised Classification, Turney firstly approach the problem of automatically extracting keywords from text as a supervised learning task, he treats a document as a set of phrases, which the learning algorithm must learn to classify as positive or negative examples of keywords. The performance has been satisfactory for a wide variety of applications. 2 Unsupervised Classification, these keywords extraction algorithms that applies to a single document without using a corpus are presented, such as term frequency, based on SWN, the term graph, the term network..Based on the analysis of existing keywords extraction using term network, an effective algorithm is proposed to extract not only high frequent terms, but also important terms with low frequency. It bases on the term network and deleting actor index. The experiment results support the conclusion.

Related Dissertations

  1. Information Extraction of Marine Oil Spill with Collaborative Images of ASAR and MODIS,X87
  2. Text -oriented disciplines correlation analysis association rule mining technology research,TP311.13
  3. Research on Keywords Acquisition Based on Semantic Distance from Web Pages,TP391.1
  4. Ananlysis of the Hotspots of Chinese Library and Information Science(1998-2007),G250
  5. Automatic extraction of domain concepts,TP391.1
  6. Research of Keywords Extraction Algorithm for Chinese Text Based on Gene Expression Programming,TP391.1
  7. Research on Summarization Abstract Algorithm Based on Improved CVSM,TP391.1
  8. Study of Image Feature Extraction and Texture Classification Algorithm,TP391.41
  9. Research on Methods for Texture Feature-based Retrieval of Remote Sensing Image,TP751
  10. Study on Data Sources Discovery and Selection on Deep Web,TP393.09
  11. Research on JPEG Image Blind Steganalysis,TP391.41
  12. Research on Image Steganalysis for LSB Matching,TP391.41
  13. Research on Coke Micrograph Segmentation Analysis,TP391.41
  14. The Research on Image Forgery Detection Based on the Second Order Texture and Noise Consistencies,TP391.41
  15. Research on Blind Detection Based on Image Content,TP391.41
  16. The Research on Image Recognition of Coal and Non-coal Based on Texture Analysis,TP391.41
  17. Research on Techniques of Texture Feature-based Medical Image Retrieval,TP391.41
  18. A Co-occurrence Network Approach to Analyzing Chinese Modern Poems and English Poems,O157.5
  19. Co-Occurrenee Relation between Verb Reduplication and Adverbs of Quantification,H146
  20. The Research on Neural Network with Fourier Weight Function and Its Application in Image Recognition,TP391.41
  21. Cross-lingual Web Pages Automatic Classification Based on Frequently Co-occurring Entropy,TP391.1

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing
© 2012 www.DissertationTopic.Net  Mobile