Dissertation > Excellent graduate degree dissertation topics show

Design and Implementation of Domain Specified Deep Web Information Integration System

Author: TaoLei
Tutor: LiuJie;MoZuo
School: Beijing Technology and Business University
Course: Applied Computer Technology
Keywords: Deep Web Information Extraction Information Integration Query Conversion Interface Matching
CLC: TP311.52
Type: Master's thesis
Year: 2009
Downloads: 54
Quote: 0
Read: Download Dissertation

Abstract


"Deep Web" information refers to information stored in searchable databases on the Internet. With the rapid development of Internet, there are more and more searchable databases on the Internet, However, It’s difficult for traditional search engines to retrieve the information hidden deeply in the databases. This kind of information can only be accessed as response to dynamically queries to the query interface on the front-end web pages. According to the demand of accessing Deep Web, we research and analysis the current deep web integration technology, and make some improvement and innovation. Major tasks in this paper are as follows:1) Propose an idea of deep web information integration by fields. Compared with traditional deep web information integration, it makes a better use of field information of deep web, makes an improvement of the accuracy of information search.2) Research and analysis the deep web information integration related technologies, design a domain specified deep web information integration framework, implemented a prototype system. This system is composed of six modules, which are deep web sites classification module, deep web query interface identification module, domain attributes extraction module, domain attributes matching module, union interface construction module and result data records extraction module. Deep web sites classification module is implemented by third-party site categories. Deep web query interface identification module is implemented by form crawler and rule filters. Domain knowledge bases are set up to help domain attributes extraction and matching. And result data records extraction module is implemented by a CSS based deep web result page data record extraction method combined with MDR.3) Ajax, the cache database, queries optimization techniques are used to optimize the system, which improved the user experience and reduced the system load.The innovations of this paper are as follows:1) Propose an idea of domain knowledge bases, which help to decompose and match domain attributes in Deep Web query interfaces. The attribute names are semantic decomposed by the similarity relations defined in domain knowledge bases.2) Propose a CSS based Deep Web result page data records extraction method, which can extract the data results from deep web result pages accurately.The domain specified Deep Web information integration system implemented in this paper uses B / S structure, and use java as development language. Spring and Hibernate are used to put up system framework, and some open source tool packages are used. The users can submit query request through front-end query interface on web pages, the back-end of the system will query the mapping sites of this domain based on user’s query request, and then it put forwards the query result from each deep web sites to system users. The experiment in book, car and video domains proves that the system performs well in Deep Web information integration and it already has a practical value.

Related Dissertations

  1. The Design and Implement of Mediator and Wrapper Mechanism in Massive Multi-Database Intergration,TP311.13
  2. Research on Domain Entity Attribute and Event Extraction Technology,TP391.1
  3. Research on Temporal Information Recognition and Normalization,TP391.1
  4. Design and Implementation of the HL7 Message Parsing and Store in the Medical Information Integration Platform,TP311.52
  5. The Design and Implementation of DICOM Middle Software and Access Control Model in Formation Integration Platform,TP311.13
  6. Study on Growth Monitoring Technique Based on Pixel Un-Mixing Method and HJ Remote Sensing Images in Paddy Rice,S511
  7. The Study of A Company Information Management Optimization,TP315
  8. Active faults based radar image information extraction method applied research and demonstration,P542.3
  9. Based on high-resolution remote sensing data mining houses information extraction,TP751
  10. Enterprise Service Bus Based Information Integration System for Die & Mold Enterprises,TP311.52
  11. The key component vertical search engine technology research,TP391.3
  12. Reptiles theme for Education News Design and Implementation,TP391.3
  13. Research on Technologies for Military Plotting Based on Graphical Element of ArcGIS,TP311.52
  14. Research on Cooperative Management Mode and Relevant Information Integration of Automobile Accessory Enterprise,F270.7
  15. Oriented data integration and analysis of defects in the semiconductor manufacturing process,TN305
  16. Research of Data Source Selection with Similar Theme in Deep Web Integrated System,TP311.13
  17. To SGMW Qingdao Branch Manufacturing Execution Systems Research and application,F426.471
  18. Deep Web Data Cleaning Method Research and Application,TP393.09
  19. Research on Data Acquisition and Topic Analysis of Online Public Opinion,TP393.09
  20. Research on Crawling Deep Web Information,TP393.09
  21. The Study on Deep Web Interface Integration and Search Strategy,TP393.09

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Software Engineering > Software Development
© 2012 www.DissertationTopic.Net  Mobile