Dissertation > Excellent graduate degree dissertation topics show

For multi- tasking, multi- channel parallel crawler technology research

Author: LiXueKai
Tutor: LiBin
School: Harbin Institute of Technology
Course: Computer Science and Technology
Keywords: Search Engine Task Assignment Web site by Information Extraction Distributed File System
CLC: TP391.3
Type: Master's thesis
Year: 2009
Downloads: 71
Quote: 0
Read: Download Dissertation

Abstract


Internet information produced by its autonomy, information dissemination , access convenience, breadth of geographical coverage , combined with low operating and maintenance costs , since the mid- 1990s, large-scale development has rapidly become the government, business , education, entertainment and a series of important social events hosted platform . Internet security is becoming increasingly important , a typical non-traditional security . For traditional search engines can not provide customized services as well as data updates slow defects , the paper developed and implemented to provide a highly customized according to user needs organizational resources to support multi-channel access to information technology, information on demand , timely access platform . Unlike traditional search engines is only one global big task , the system under different users need to handle a variety of tasks issued . Each user only interested in a few sites , and real-time requirements are relatively high , in a specific period of time , the need for frequent target site crawl . The system studied is a multi-task management and allocation of parallel reptiles, mission objectives often contain multiple sites, and the task start time, the next time the task can be set , multiple supervisory tasks they may need to be concerned with a website . In order to improve task parallelism, the system will be split into more fine-grained task fragmentation manage assigned tasks , while using the Hash algorithm provides consistency reptiles task allocation, the algorithm makes it possible load between reptiles equal, while the maximum Increase limit reduced server tasks when fragmentation redistribution . For different sources of data , this paper uses a multi-channel technology, according to the characteristics of each data source , customize different analytical solutions . The paper also analyzes the traditional way search engines assign tasks and problems , and then depending on the system , this paper proposes a new method for finer granularity than the traditional task distribution sites divided . The allocation of a larger scale is cut into a number of smaller sites subset and a subset of the nodes to a number of parallel crawlers crawl crawler system to speed up the overall acquisition rate as effectively optimize the traditional methods .

Related Dissertations

  1. Research on Domain Entity Attribute and Event Extraction Technology,TP391.1
  2. Study on Growth Monitoring Technique Based on Pixel Un-Mixing Method and HJ Remote Sensing Images in Paddy Rice,S511
  3. Based on high-resolution remote sensing data mining houses information extraction,TP751
  4. The Study of Topic-Oriented IT News with Search Enging and Web Page Analysing,TP393.092
  5. Topic search engine key technology research,TP391.3
  6. Dynamic learning framework based on structured automatic web data extraction method,TP393.092
  7. A class of multi-robot systems research task allocation method,TP242
  8. Study on the Influence of Company’s Official Virtual Community User’s Role Changing,F224
  9. Theory and application of sports of human science website construction,G804.2
  10. Research on Technologies of Search Engine Based on Peer-to-Peer Networks,TP391.3
  11. Analysis of Business Process Efficiency Based on Task Assignments,F272
  12. Provincial government website security management mechanism,TP393.092
  13. The Refresh Strategy for Webpage of Large-scale Website in Search Engine,TP393.092
  14. Development of human technology - based web content extraction system,TP393.092
  15. Java-based Zhejiang Textile \u0026 Fashion College campus network search engine,TP393.18
  16. Research and Implementation of an Information Pre-process Platform of Public Opinion,TP393.09
  17. Optimization Design and Implementation of Vertical Search Engine for Software Security Domain,TP391.3
  18. Concept tree based Web Information Extraction Technology Research,TP391.1
  19. Research on the Theory and Method of E-Catalog Ontology Self-learning Oriented on Customer,TP391.1
  20. The Research of Expanding the Semantic Information Function to Search Engine,TP391.3
  21. Research on Automatic Search Engine Performance Evaluation Based on Clustering Analysis,TP391.3

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Retrieval machine
© 2012 www.DissertationTopic.Net  Mobile