Dissertation > Excellent graduate degree dissertation topics show

Research on Compression, Operation and Query Processing Methods of Massive Datasets

Author: ZhangChunHe
Tutor: LiJianZhong
School: Harbin Institute of Technology
Course: Computer Science and Technology
Keywords: Massive Data Scientific and Statistic Database Compression Database Column-Compressed Storage
CLC: TP311.13
Type: Master's thesis
Year: 2008
Downloads: 356
Quote: 1
Read: Download Dissertation


Nowadays, the information technology developed rapidly and we have entered a new stage with massive data. It is an urgent mission to study the management on massive data for the social informationization. This is a great challenge to DBMS on how to store and manage the massive data efficiently and support SQL queries effectively.The massive database, such as the scientific and statistical database, is widely used in earthquake monitor, weather forecast, experiments about physics and chemistry, and so on. There are lots of data redundancy in such database which means the same data exist in different places repeatedly. If we store the data directly, not only the storage is wasted but also the performance of query is degreed. In addition, the relation schema is relatively stable and the candidate values for each attribute are limited. The new arrival data are only appended to the end of the current data area without updating exited data. Queries on data are only relative with minority among the plenty of attributes.The compressed database technology is the combination of data compress technology and database technology to process the storage and query on massive database. The compressed database technology includes data compression methods, data operation algorithms and query processing techniques.In this paper, we propose a new compression method and storage architecture which are suitable for massive database and supporting data operation and query processing efficiently.The compression method proposed in this paper adopts the idea of Column-Compressed Storage and uses the Binary Encoding, Unary Encoding, K-of-N Encoding and Superimposed Encoding to compress the massive data. The encoded data are then stored according to the encoding bit with an extended run length encoding.We also propose data operation algorithms on compressed data without decompressing, including the selection and projection. The operations on original data are converted into operations on the compressed bit files which are simple to realize. A prototype of compression and query on data in massive database is designed and implemented with the above technology. Theoretical analysis and preliminary experiments results show that compression using column-oriented storage can reduce the storage space, lower the query cost and improve the query efficiency.

Related Dissertations

  1. Implementation of Data Compression, Operation and Query Processing System Based on BAP,TP311.13
  2. Fault Tolerance for MapReduce in the Cloud Environment,TP302.8
  3. Unbalanced data set classification method and its application in the telecommunications industry,TP311.13
  4. The Processing of Massive Laser Scan Measurement Date,TN249
  5. Research on Compression Database Technology Based on Partition of Properties,TP311.13
  6. Mass data storage Digital Library Research and Implementation Organization,TP333
  7. Urban motor vehicle traffic intelligent monitoring and management of central software system,TP311.52
  8. Research and Implementation of Mass Data Migration and Report Automatic Generation,TP311.52
  9. The Design and Implementation of the Integrated Database System of Severe Weather,TP311.52
  10. Magnetic confinement fusion experiments analysis of massive data retrieval,TP391.3
  11. The Research and Implementation of Massive Short Message Mining Technology,TP311.13
  12. Application of VR-GIS Technology in Geotechnical Engineering,P208
  13. Large Scale Data Management in Video Social Network,TP391.41
  14. Research on Effective Management and New Service Model for Chinese Calligraphy,TP391.41
  15. Remanufacturing Oriented Research on the Robotic 3D Surface Inspecting System with Vision Measturement,TP242
  16. Acquisition and Analysis of Network-link Traffic Base on sFlow,TN915.02
  17. Research of Replica Location and Replica Placement for Massive Data,TP393.02
  18. For the public security area mass data storage retrieval system development,TP391.3
  19. Research and Implementation of Distribute Massive Text Data Index and Retrieval System,TP391.3
  20. Research on On-line Defect Detection Technology for Float Glass,TP274.4

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems
© 2012 www.DissertationTopic.Net  Mobile