Dissertation > Excellent graduate degree dissertation topics show

Research and Implementation of Statistic over Data Streams for Massive Database System

Author: WeiZuo
Tutor: HanWeiHong
School: National University of Defense Science and Technology
Course: Computer Science and Technology
Keywords: Data Stream VLDB statistic semantic caching maintenance abnormal data duplication eliminating
CLC: TP311.13
Type: Master's thesis
Year: 2008
Downloads: 168
Quote: 0
Read: Download Dissertation

Abstract


With the development of computer techniques, the application for very large data base has become more and more popular. The data stream technology has already been studied and many excellent algorithms and products have been brought forward, which make it a mature data base model. The data loading by data base, which has the characters of continuity, speedy, changing with time and so on, thus can be is treated as a data stream to deal with.Starting with the data processing before storage, through the research of data stream statistic, this paper puts forward an data stream statistic severing architecture, and realizes information statistic processing of the loading data stream. According to the background of a very large statistic application data base, this paper also realizes the processing of abnormal data in loading data stream. It not only does the statistics of the abnormal data, but also assure to renew the statistic results with abnormal data, making the following processing results consistent with the records in data base. Meanwhile, in order to satisfy the demand of loading service after adding statistic service, lightening the pressure for the following query, we also put forward an short-text based effective method to eliminating duplication, aiming at the repetitious data in stream. In the end of the paper, we test the statistic service and verified its correctness.In the paper , we mainly focus on using the data stream statistic results to maintain semantic cache as the specific application examples of data stream statistic service. The using of data stream statistic in semantic cache maintenance, can reduce the response time of aggregated query, transfers the processing pressure in query server into loading server, and then enhances the whole performance and stability of system.We have made several contributions in this paper.(l) We have brought forward a data stream statistic service architecture facing very large data base loaing, and the statistic service can effectively finish the statistic with little affects on loading process.(2)We have realized the statistical method for abnormal data stream. By adopting multiple data stream processing methods, we have maintained an abnormal data stream glide window beside the regular data stream glide window, and the dynamical allocation base window have accomplished abnormal data statistics, and renew the statistic base and query results with the statistical results that were sever hours later.(3)We researched the semantic caching maintenance, and through combining the statistical results and semantic caching, put forward a way to solve the semantic caching maintenance problem. By transferring the pressure of the query data base server to loading process, it enhances the whole systematic function and stability. (4)We study the data washing technology, aiming at the duplicated data in short text, carry out an effective eliminating duplication method to deal with mass short text data base, which reduce the data scale and then elevate the performance of data base continuous processing.According to the technology mentioned in paper, we have realized a data stream statistic service facing large quantity of data loading on large-scale affair transactions processing middleware StarTPMonitor. Combining the statistical summary information and semantic caching, the service improves the performance of semantic caching, and greatly enhances the capacity of system query ability.

Related Dissertations

  1. Research on Compression, Operation and Query Processing Methods of Massive Datasets,TP311.13
  2. EMU maintenance base in Guangzhou Project Management Research,F532
  3. Study on the Relationship between Hope Level and the Status of Anxiety and Depression of Patients with Maintenance Hemodialysis,R473.5
  4. Baotou city road maintenance mechanism reform,U418.2
  5. On the Hardware Design of Remote Maintenance System of Central Air Conditioning,TB657.2
  6. The Application of SAP (Systems Applications and Products) in Equipment Maintenance,E92
  7. Research on State Diagnosis on Fan Based on Factor Analysis and BP Neural Network,F426.61
  8. Test Whether the Two Population Covariance Matrices Are Proportional,O212.1
  9. The Applied Research of Ultra-thin Wearing Course NovaChip in GuangShen Expressway,U418.6
  10. Based embedded software fault-tolerant data flow anomaly detection,TP368.1
  11. Disassembly Process Planning and Simulation for Virtual Maintenance,TP391.9
  12. Pre-Processing and Analysis of Chang’E-1 Lunar Microwave Sounder Data,V446.9
  13. Design of Aircraft Maintenance Tool Management System Based on RFID,TP311.52
  14. Real-time Query Processing and Optimization for Basic Events from RFID Data Streams,TP311.13
  15. Application Research on Equipment Management of ONS Medical Company,F426.4
  16. Research and Development of a DSP Based Monitoring System,TV738
  17. Study on the Construction Methods of Exterior Greening Wall in the Summer Hot Area,TU985.125
  18. The Optimization Research of Power Generation Equipment Management Based on the Life Cycle Cost,TM73
  19. Shenzhen Airlines Management Service Provider,F562
  20. Aircraft design method of manufacturing and maintenance costs,V22
  21. ACCC conductor transmission line based Rehabilitation Study,TM755

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems
© 2012 www.DissertationTopic.Net  Mobile