Dissertation > Excellent graduate degree dissertation topics show

Using Form Classifier to Identify Domain-Specific Deep Web Entries

Author: XuYingYing
Tutor: ZuoWanLi
School: Jilin University
Course: Computer Software and Theory
Keywords: Depth network entrance Focused Crawling Form Classification
CLC: TP391.3
Type: Master's thesis
Year: 2007
Downloads: 160
Quote: 2
Read: Download Dissertation

Abstract


In order to be able to index the massive resources in the depth of the network , and to overcome the problems of traditional search engines , build a form classifier is used to identify specific areas of the depth of the network entrance , and use it to guide the crawling . Form classifier mainly consists of two parts : First, recognize the depth of the network entrance This article take full advantage of the structure of the form as a feature vector , by comparing selected C4.5 decision tree classification algorithm on a training set using 10 -fold cross verification, accurate rate of 97.5062% ; then, the inlet depth network classification , i.e., the depth network inlet associated with the selected theme , the paper selected extraction of the text information in the form to create a document vector , by comparing the selected support vector machine (SVM) as the classification algorithm on the training set using 10 -fold cross- validation, in a choice of four categories , are more than 90% accuracy rate . That to form classifier accuracy rate of 94.5455% for the test set . Finally, specific areas of the depth of the network entrance crawling framework . To airfare , for example, in the form classifier under the guidance of the crawlers harvest than 80.198% . Thus, the classifier of this form can be applied to the inlet of the depth of the network of the network to identify specific areas .

Related Dissertations

  1. K Company’s Improving Planning and Forecasting for the Reasonable Allocation of Inventory,F224
  2. The Research on Artisitify Technology of Santiago·Calatrava’s Architectural Works,TU-86
  3. The Research of the High-Rise Building’s Form and Expression in the New Century,TU971
  4. Study on Restoration of the City Gate of Shangjing in Parhae,TU-05
  5. The Simulation and Analysis of Spectrum Sensing Based on Higher Order Cumulants,TN911.23
  6. Research on the Classification Based on the Reconstruction of Solder Joint,TP391.41
  7. The Research and Implementation of Protein Classification Algorithm on the Basic of String Kernel,TP301.6
  8. Context-Dependent Lexical Paraphrasing,TP391.1
  9. Research on Text Classification Based on Biomimetic Pattern Recongnition,TP391.1
  10. Tourism Comments on the Internet’s Semantic Analysis and Usefulness Research,TP391.1
  11. Research on Classification Method of Tongue Substance Color and Tongue Coating Color Based on SVM,TP391.41
  12. Research and Application of Diverse Density Learning Algorithm,TP181
  13. Research on Target Tracking and System in Wireless Sensor Networks Based on Character of Movement and Terrain Restriction,TP212.9
  14. Study on the Preparation of the Low Molecular Weight Fuciodan from Sargassum Henslowianum (C.Agardh) and Anti-tumor Activity,TS254.9
  15. Uptake and Accumulation of Heavy Metals of Pistia Stratiotes and Eicharnia Crassipes under Combinated Pollution Condition,X173
  16. Tripartite Associations Among Bacteriophage WO, Wolbachia, and Host Affected in the Two-Spotted Spider Mite Tetranychus Urticae Koch,S433.7
  17. Study on Current Situation and Constructive Suggestions of Establishment of Our Service-Oriented Government,D630
  18. The Research of Technology for 3D Fashion Virtual Distortion and Display,TS941.26
  19. Application of Numerical Designing Cigarette Blending Formula on Computational Intelligence,TS44
  20. Establishment of the Molecular Identifying System of Curvualria and Application in Difficult Species,Q949.32
  21. Research on Nondestructive Detection Technology for External Qualities of Papayas Based-on Vision,S667.9

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Retrieval machine
© 2012 www.DissertationTopic.Net  Mobile