Dissertation > Excellent graduate degree dissertation topics show

Implementation of Data Compression, Operation and Query Processing System Based on BAP

Author: JiaJunGang
Tutor: GaoHong
School: Harbin Institute of Technology
Course: Computer Science and Technology
Keywords: Massive Data High Frequency Data Compressed Database Data Operation Column-Compressed Storage System
CLC: TP311.13
Type: Master's thesis
Year: 2008
Downloads: 43
Quote: 0
Read: Download Dissertation

Abstract


Accompanying with the development of information techniques and its wide application in finance, traffic, national defense, environment and ecosystem monitoring, massive data is deluging the whole world. This is a gread challenge to DBMS. As the ratio between the capability and price of disk becomes higher and higher, the really problem is how to store and execute queries on massive data efficiently, instead of the storage of massive data itself.There are a lot of data redundancy in massive high frequency data, which means the same data always exist in different places repeatedly. Such redundancy not only wastes storage but also degrades the performance of query. And if we make full use of the compressed database technology, we can reduce the storage amd I/O bandwidth. The research of compressed database technology includes the design of compression algorithms and compressed data query algorithms.There has been renewed interest in column-oriented database architectures in recent years. For read-mostly query workloads such as those found in data warehouse and decision support applications,“column-stores”have been show to perform particularly well relative to“row stores”. Storing data in columns presents a number of opportuneities for improved performance from compression algorithms when compared to row-oriented architectures.Based on the existing relational database techniques,this paper focuses on the researching about data compression methods and storage architectures which are suitable for high frequency data and corresponding query processing technology on them, including data operations and some query optimizations. The main results are as follows:It proposes one kind of compression and storage strategy called TIDC. TIDC is a column oriented compression method based on attribute partition. It uses the information of position (called TupleID in the paper) to connect all the attributes in the database. By only storing the position and its value of the non-constant data from the same attributee, TIDC reduces the storage of the data and makes complete mapping from the original data to the compressed data. To operate on the compressed data, we can get the result of a query without decompressing the compressed data. It presents data operation algorithms including selection, projection and join, and some optimization strategies based on compressed data corresponding to TIDC method.It proposes compression algorithm and data operation algorithms including selection, projection and join, and also give some optimization strategies for the query processiong corresponding to BAP method.A prototype of compressed DBMS using above technology is implemented. Theoretical analysis and preliminary experiments results show that by compressing and storing by column-oriented strategy based on attribute partion, it can greatly reduce storage space, lower query I/O cost and improve query efficiency. What’s more, the amount of massive data has less effect on query efficiency using TIDC than that of BAP.

Related Dissertations

  1. Research on Compression, Operation and Query Processing Methods of Massive Datasets,TP311.13
  2. Fault Tolerance for MapReduce in the Cloud Environment,TP302.8
  3. Magnetic confinement fusion experiments analysis of massive data retrieval,TP391.3
  4. Unbalanced data set classification method and its application in the telecommunications industry,TP311.13
  5. Comparison of Alternative ACD Models Via Density and Interval Forecasts,F830.9
  6. The Research on Storage and Transmission Methods of the Massive Data,S712
  7. Design and Realization of Parallel File IO Based on Hadoop Distributed File System,TP338.6
  8. Study on Price Duration of Chinese Bond Market,F832.51
  9. CSI 300 stock index futures launch spot fluctuations,F224
  10. Study on Applying a Statistical Arbitrage Technique of Pairs Trading to High-Frequency Data,F224
  11. The application of wavelet analysis in the high - frequency financial data analysis,F224
  12. The Processing of Massive Laser Scan Measurement Date,TN249
  13. Design and Implementation of Distributed File System for Massive Data,TP316.4
  14. MapReduce-based distributed programming framework for the design and implementation,TP311.52
  15. The Investigation of Micromarkets Structure Base on High-Frequency Data,F830.9
  16. Based on Symbolic Time Series of Ultra-high Frequency Financial Volatility,F224
  17. Design and Implementation of System Massive Data Processing Based on Hadoop,TP311.52
  18. Mass Remote Sensing Image Management System Research and Implementation,P237
  19. Design and Implement of Interface Signal Processing Module for Survey and Communication Device,TP274
  20. The Effects of Tick Size on the Liquidity of Chinese Fund Market,F832.5

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems
© 2012 www.DissertationTopic.Net  Mobile