Dissertation > Excellent graduate degree dissertation topics show

The Design and Implementation of Web Information Extraction System

Author: DingQiaoYi
Tutor: ZhangYu; SunYiMing
School: Harbin Institute of Technology
Course: Software Engineering
Keywords: Web data mining Web data extraction template extraction list extraction
CLC: TP311.52
Type: Master's thesis
Year: 2012
Downloads: 62
Quote: 0
Read: Download Dissertation

Abstract


Nowadays, the Web which is based on the Internet, is playing a more and moreimportant role in people’s daily life. There is lots of information conveyed by the Web,which makes it a significant information source in people’s life. Finding a convenientway of digging the desired information from the vast amount of the data on the Web isvery important. Web information extraction is one of the useful solutions. Thisprogram comes from search platform department at Alibaba.The thesis is mainly about the analysis of Web extaction problem, according to itsapplication fields. The thesis defined the extaction problems, from the view of theextraction tragets’ and Web pages’ features, and also put forward specific Webextraction solutions to them. Meanwhile, how to design and implement a Webinformation extraction system, using those solutions, is an importamt topic, as well. Byusing this system, users could easily get the desired data and information from Web.In the process of this program, author analysed the problem, which Webinformation extraction solutions foused on, and defined a data model to indicate theWeb structure information. Based on the system’s application fileds, author describedbussines application scenarios, which finally are concluded as the original systemrequirements. At last, according to the software developed life cycle, the system’srequirement analysis, design and implements, and testing are introduced. In this part,author used the use case model to express the requirements, and so do the system’sdesign and implements by functional model and system architure diagram. As the coreparts of this topic, the design and implements of the workflow engine and Http serviceframework are described using class diargram, sequence digram, activity digram andflowchart diagram. Last but not the least, the thesis introduced kinds of Web extractionalgorithms, such as template extraction, list-detail model auto extraction, and so did theevaluations of these algorithms. Finally, by system testing and algorithm evaluation,the system’s satisfying the predefined requirements was proved.

Related Dissertations

  1. Research on the Application of Data Mining Techniques in the Stock Market Analysis,F830.91
  2. Deep Web Interface Integration and Data Tagging studies,TP393.09
  3. PCR Technology for Rapid Detection of Beer-spoilage Baeteria,TS262.5
  4. Research and Design of Data Mining System Based on User Behavior Analysis,TP311.13
  5. Research and Improvement of Web Structure Mining Algorithm,TP393.09
  6. Research of Web Data Mining in E-commerce Companies,TP311.13
  7. XML-based Web data mining research,TP311.13
  8. The Research of Recommendation System in E-Commerce Based on Web Data Mining,TP319
  9. Research on Interactive Web Data Extraction Based on Tree Matching,TP393.092
  10. The Research and Application of Web Access Information Based on UIMA Architecture,TP393.09
  11. Synthesis and Characterization of Mesoporous Silica Materials with Template,TB383
  12. The Research of Data Mining Technology Based on Web,TP311.13
  13. The Study and Implementation on Drawing System of the Scene of Road Traffic Accident,P285
  14. Study of Jiangxi nonferrous metal industrial technology foresight method based on Data Mining,F224
  15. Web Data Mining and Information Collection Technology Research and Application of Automatic Grap Net Work News,TP393.09
  16. The Research of User Needs Customization System Based on B/S Architecture and Web Mining,TP393.09
  17. Web-based Data Mining Technology and Its Application,TP311.13
  18. Research and Application of Digital Image Template Extraction and Matching Approach,TP391.41
  19. Personalized Push Model and System of Agricultural Science and Technology Information Service,TP391.3
  20. Comsumption Intent Recognition in Micro-blog,TP393.092
  21. Study and Construction of a Chinese Uyghur Aligned Template Library for Generalized Example Based Machine Translation,TP391.2

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Software Engineering > Software Development
© 2012 www.DissertationTopic.Net  Mobile