Dissertation > Excellent graduate degree dissertation topics show

Research of Building Cloud Computing Platform for Processing and Analyzing Massive Data

Author: XiaoTong
Tutor: XiongCongCong
School: Tianjin University of Science and Technology
Course: Detection Technology and Automation
Keywords: Massive Data Hadoop Search Engine
CLC: TP311.13
Type: Master's thesis
Year: 2011
Downloads: 47
Quote: 0
Read: Download Dissertation

Abstract


Nowadays, as the rapid development of the Internet and the growth of Internet people, there is a flood of information to process for those Internet companies that provide network services. They have to analyze the needs of the users and the effects of a variety of products and so on. Often there will be some of the data analysis time requirements. For the real storage space and processing time requirements, the traditional database system has been difficult to meet. The main purpose of this paper is to build a massive low-cost distributed data processing system to store and process the data.As a starting point to this problem, after analyzing the existing distributed computing and storage on the basis of key technologies, combining with Hadoop cloud computing technology research and the actual hardware and software capabilities on campus network, to meet their own needs, this paper presents a model based on cloud computing for the data processing, researches several aspects of this model from the data structure design, system module, program flow and programming platform. Finally, this model is applied to a distributed mass data search engine. The above study indicates that the reliability, efficiency and scalability of the Hadoop cloud computing platform meet the technical requirements of the distributed search engine. This paper uses Hadoop system as the platform for distributed computing application systems. This paper analyzes each step of the crawling, indexing, searching in the traditional search engine process, improves its function modules, and decomposes these non-sequential steps into two sub-tasks:data computing task and data combining task. Meanwhile, it encapsulates all the data computing tasks into the Map function, and the data combining tasks into the Reduce function by using Map/Reduce programming ideas. The main tasks of this paper are deploying the improved search engine system on a Hadoop cloud computing environment which was structured by some inexpensive computers, so that it has fast response, high reliability and scalability.The main characteristic is the integration of the model proposed by the research and practival application of business. Using forefront distributed framework technoloty to better meet the needs of the project and deploy the model to actual distributed environment, to test the system with the experimental results of practical value, such as high efficiency, low cost, scalability, and ease of maintenance and so on.

Related Dissertations

  1. Implementation of Data Compression, Operation and Query Processing System Based on BAP,TP311.13
  2. Web search engine related technology research,G354
  3. The Design and Implementation of Lucene-Based Network Literature Vertical Search Engine,TP391.3
  4. Research of Intelligent Search Engine Based on Semantic Web,TP391.3
  5. Research and Application of Map/Reduce Based Distributed Log Analyzer,TP311.52
  6. Meta Search Engine Based on BP Network,TP391.3
  7. Search Engine Provider Copyright Infringement Liability Standard Discussion,D923.41
  8. Design and Implementation of Online Shopping Prototype System Based on Hadoop,TP311.52
  9. Design of the Mobile Learning System Based on Hadoop,G434
  10. HADOOP architecture based on the social security project web log analysis system,TP311.52
  11. 3D Mannequins Generating Engine Based on eMTM with MapReduce,TP391.41
  12. The Research of Software Service Platform Based on Cloud Computing,TP311.52
  13. Research and Implementation of Vertical Search Engine Based on Distribution,TP391.3
  14. Research on Fast Queryalgorithm of Massive Data,TP311.13
  15. An Intrusion Detection System for High-Speed Networks,TP393.08
  16. Incremental Learning Method Based on Cloud Computing,TP311.13
  17. Cloudqueue: An Internet-Scale Messaging Infrastructure Based on Hadoop,TP311.52
  18. The Research of Text Classification Based on Hadoop,TP391.1
  19. Research and Implementation on a Distributed Service Registry Based on HADOOP Platform,TP393.09
  20. Resarch of Task-level Data Processing Based on Multicore CPU and Test of Its Performance on Cluster Platform,TP274
  21. Hadoop-based video transcoding system design and implementation,TN919.81

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems
© 2012 www.DissertationTopic.Net  Mobile