Dissertation > Excellent graduate degree dissertation topics show

The Study of Topic-Oriented IT News with Search Enging and Web Page Analysing

Author: ZhaoYuYong
Tutor: ZhangWei
School: Ocean University of China
Course: Applied Computer Technology
Keywords: News by category Search engine Industry role model Text Mining
CLC: TP393.092
Type: Master's thesis
Year: 2010
Downloads: 38
Quote: 0
Read: Download Dissertation

Abstract


News and entertainment of people's day-to-day work life very relevant information for the influential news event, depth and span a larger theme of news is more informative interesting, the so-called theme news, its timeliness prominent news new, its time span large prominent \In recent years, the Internet has become the best platform and the largest source of news and information release, the various news rapid spread of the Internet in various forms. The explosive growth of the other hand information on the Internet, more and more difficult to manually obtain more full news content, and search engine technology as one of the access to information program has made considerable progress, represented by Google search The engine will extend into every corner of the information on the Internet. How in-depth comprehensive mining news and information for the many news-related work is significant, mining depth and comprehensive news and information through search engines, is the focus of this study, that by further mining news content related to a particular topic, a theme News. IT news crawl on the nature of Web data mining process. Mining classified first sample of 2009 hot news and analysis on the basis of the classification of samples to identify the characteristics of each sample, the industry role model (Trade-role Model). The model proposed on the basis of comparative analysis and search based on user interest model, and ultimately the formation of an industry role score formula to evaluate the sample. Based on this model, the news crawl through a two-step realization of this article topic. The first step, transform keyword search, and search engine results URL extracted. This step is the basis of this study, the quality of the extracted directly determines the success or failure of the follow-up work. Choose to use the machine to use its search results by the search engine Google search feature in several programs by industry role-based model will compare the URL link evaluation score of these links and screening, this step will remove most of the garbage or useless links retain links related news topics, and select the highest score some for later use. A second step, the URL corresponding to the news body extracted. This step is the final research results of this paper, the corresponding page on the URL link to search before step screening analysis, extract pages corresponding text file by industry role model for text mining based evaluation the TRM model to paragraph score, and last paragraphs homeostasis, compare trade-offs of the characteristics of the above scores and news pages, extract the contents of the body of the corresponding news. See from news samples crawl the final result, the average precision rate of 90.2%, average recall rate reached 72.8%. Eventually crawled news text, the final form of News by the body of the text. Manually refining the news on the Internet to spend a lot of manpower, and refined through the use of search engine results and procedures related news content will save a lot of human resources, and news events rapid and comprehensive picture of the face of the network audience, which This paper studies the value.

Related Dissertations

  1. Research on Sentiment Orientation Analysis of Blog Article Based on Blog Search,TP391.1
  2. Agent-based meta-search engine , personalized study,TP391.3
  3. Research on Technologies of Search Engine Based on Peer-to-Peer Networks,TP391.3
  4. The application of data mining in the acupuncture literature Meridian Research in,TP311.13
  5. Research and Implementation on Key Technologies of Web Text Mining Oriented to Enterprise Competitive Intelligence,TP391.1
  6. Research on Antomatic Chinese Text Summarization of Web-oriented Text Mining,TP391.1
  7. Research of Meta-based Web Military Intelligence Search Technology,TP391.3
  8. A Study of Personalized Search Based on Blog Content,TP391.3
  9. Java-based Zhejiang Textile \u0026 Fashion College campus network search engine,TP393.18
  10. Text based on SVM multi-class classification,TP391.1
  11. Research and Implementation on Users’ Interest-Oriented Web Search Strategies,TP391.3
  12. The Research of the Search Engine in the View of the Mass Commutation,G206
  13. Users’ Status under the Background of Search Engines’ Commodification,G206
  14. Research on the Topical Search Engine Based on Semantic,TP391.3
  15. Research on Several Models in Text Classification and Clustering,TP391.1
  16. An Application Research of Information Extraction on Topic Search Engine,TP391.3
  17. Music academic exchange platform for the design and implementation of,TP311.52
  18. Design and Realization of Subject-Oriented Search Engine,TP391.3
  19. Research and Implementation of the Vertical Search Engine System Based on JAVA with LUCENE and HERITRIX,TP391.3
  20. Research and Implementation of Index Technology in Domain-specific Search Engine,TP391.3

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Computer network > General issues > The application of computer network > Web browser
© 2012 www.DissertationTopic.Net  Mobile