Dissertation > Excellent graduate degree dissertation topics show

Research on Web Information Extraction for Domain in Information Integration System

Author: LiuHui
Tutor: XuXueZhou
School: Xi'an University of Electronic Science and Technology
Course: Computer Software and Theory
Keywords: Information Integration Web information extraction Extraction rules Extraction framework
CLC: TP393.092
Type: Master's thesis
Year: 2008
Downloads: 109
Quote: 0
Read: Download Dissertation

Abstract


The explosion of information on the Internet , how to accurately and quickly retrieve the required information , and more efficient use of these information, become problems to be solved . Information integration system IIS (Information Integration System) , how to integrate heterogeneous Web data source and the upper application to provide a unified interface , the system provides the broadest, most large and real-time data , Web information extraction system must be solved . The focus of this study include the two parts of the Web information extraction rules and extraction system framework , proposed and implemented a field - oriented information extraction framework , Web page DOM and NLP method for extracting adaptive . Wrapper core extraction rules , that source mapped description of the target mode . DOM-based information extraction mapping method proposed in this paper , using a standard XML technology to manipulate Web pages , obtained by inductive learning extraction rules execution the rules explained engines for extraction results . For non - data -oriented Web page , the introduction of the theory of the field of NLP characteristics of the label 's Web page , the data source into segmentation / classification process , match trigger mode , calculate the semantic distance in a trigger event to determine the required extraction the items of information . NLP - based extraction method to compensate for the lack of DOM mapping method . Preprocessing the data source in the system , to detect and extract the rough information block information entropy . The underlying The domain ontology files describe the field of information , mapping extraction of basic information for decision-making in the upper layer , in order to switch in the field . Extraction results are stored in a database , and provides the extracted ontology library for other modules in the information integration system . Extraction test , through the field of Web pages to extract results verify the effectiveness and availability of the extraction algorithm and system framework , and scalable research and commercial applications .

Related Dissertations

  1. The Design and Implement of Mediator and Wrapper Mechanism in Massive Multi-Database Intergration,TP311.13
  2. The Design and Implementation of DICOM Middle Software and Access Control Model in Formation Integration Platform,TP311.13
  3. Web-based Mining Technology and Its Application in Digital Library,G250.76
  4. Design and Implementation of the campus fee system,TP311.52
  5. Enterprise Service Bus Based Information Integration System for Die & Mold Enterprises,TP311.52
  6. Research on Technologies for Military Plotting Based on Graphical Element of ArcGIS,TP311.52
  7. Design and Implementation of Agricultural Information Website Cluster Based on Content Management System,TP393.092
  8. Design and Implementation of Web Information Extraction Based on DOM,TP393.09
  9. Concept tree based Web Information Extraction Technology Research,TP391.1
  10. Research of Ontology-Based Logistics Information System Integration Method,TP311.52
  11. The Analysis and Designs of Customer Relationship Management on Ningxia Telecom Company Limited,F626
  12. Research on Ontology-Based Web Information Extraction Technology,TP391.1
  13. Web-based information is automatically extracted English exam generation algorithm,TP393.09
  14. Ontology-based Web information extraction field of tourism,TP391.11
  15. The Study of Framework of Integration for Logistics of Corporation Based on SOAP and XML,TP311.52
  16. Research on Information Integration System Based on Multi-Agent,TP311.52
  17. Research for the Integration and Application of Enterprise Information Systems,TP311.52
  18. Analysis for Dissemination of Public Opinion Based on Web Information Extraction,TP393.09
  19. Research and Application of Web Data Extraction Mode Based on Tree Structure,TP393.092
  20. Research on Construction and Management of Information Integration in the Infrastructure and Campaign Periods of Thermal Power Project,F426.61
  21. Research on Some Key Issues in Integrated Automation of Beer Production Process,TP273.5

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Computer network > General issues > The application of computer network > Web browser
© 2012 www.DissertationTopic.Net  Mobile