Dissertation > Excellent graduate degree dissertation topics show

Research of Preference Queries on Probabilistic Database

Author: WangXiaoWei
Tutor: JiaYan
School: National University of Defense Science and Technology
Course: Computer Science and Technology
Keywords: Uncertain data probabilistic database preference query top-k skyline
CLC: TP311.13
Type: PhD thesis
Year: 2011
Downloads: 34
Quote: 0
Read: Download Dissertation

Abstract


Uncertainty is inherent in the real world and the precision of the query and analysisis the major concern in many modern applications like business data management,finance analysis, sensor, RFID, and GIS, etc. However, Traditional databases techniqueswhich are based on deterministic theory are not prepared to handle imprecise data. Theprobabilistic database techniques which are based on probability theory can manageuncertain data directly and emerges as a hot field in the database community. In theapplication layer of databases preference queries are used to fulfill the diverserequerment of individual users, among which skyline and top-k query are the mostimportant ones.This dissertation focuses on efficient and general solutions for preference querieson probabilistic database, and addresses several challenging problems of probabilisticskyline and probabilstic top-k queries. The main contributions can be concluded asfollows:(1) Non-indexing probabilistic skyline algorithm. Existing probabilistic skylinealgorithms are generally based on R-tree indices, which are not general enough.Non-indexing algorithms, which require no preprocessing or index structures, have thewidest applicability and are necessary complements to the indexing algorithms. Twonon-indexing algorithms that based on in memory R-trees and ordered list respectivelyare proposed. The basic idea is to estimate the skyline probability of future objects fromthe objects accessed before. Experimental results show that both algorithms outperformthe naive algorithm, and are complementary to each other.(2) Distributed probabilistic skyline algorithm. In practical applications, data isoften distributed across multiple nodes, and applications use distributed queries toobtain the holistic results. The query delay mainly depends on the communicationoverhead, and it is improtant to design communication efficient distributed algorithms.A framework based on summary sharing is first proposed, and then two specificalgorithms namely VOS and GRID are proposed respectively. The VOS algorithm usesvirtual objects as the summary of data distribution. In order to reduce additionalcommunication cost caused by summary sharing, a distance based compression schemeis further proposed. GRID algorithm is proposed to overcome the shortage of datadependency on data distribution of the VOS algorithm. In the GRID algorithm, eachobject is mapped to a cell of a regular grid, and a compression scheme of grid isproposed to reduce the additional communication cost of grid sharing. Experimentalresults show that, VOS and GRID outperform the existing algorithm in communicationcost, and the former is more suitable for independent like or high dimentional datasets,while the latter is suitable for anti-correlated datasets. (3) The probabilistic top-k indexing scheme. There are a collect of semantics of theprobabilistic top-k query because it needs to rank tuples taking into account both theprobability and scoring function. Different query algorithms are designed for each querysemantic, and there is lack of indexing schemes applicable to a class of important querysemantics. Two layer indexes that can be used by a collect of important probabilistictop-k queries are proposed. The first one is based on the concept of skyline, the secondone is based on dominate frequency, and is proposed to achieve better robustness.Morever, an efficient indexing building algorithm is proposed. Experimental resultsshow that both indexing schemes reduce the objects accessed by probabilistic top-kqueries, and the latter is more robust.(4) Probabilistic reverse top-k query. Above research topics all consider the queryevaluation problem from the perspective of users. The reverse top-k query is importantin business analysis because it helps manufactures learn preferences from the data oftop-k preference. Existing reverse top-k queries assume the underlying data is certain,however, in the real-world applications data is often imprecise because of dataintegration. A reasonable definition of probabilistic reverse top-k query is first proposed,and then an efficient query algorithm based on materialized views is presented. Thealgorithm partitions the data space to a grid, and stores the pre-computing query resultsof grid vertices. Subsequent queries use the materialized set preferences to avoid fullscan. Experimental results show that the query algorithm reduces the number ofpreferences need to calculated and scales well in cardinality of preference set.In summary, this dissertation focuses on the most important preference queries ofprobabilistic database, and provides efficient and general solutions. These works haveacademic and practical value for advancing the theory and practicability of probabilisticdatabase.

Related Dissertations

  1. Empirical Study on the Top Management Turnover and Performance of Listed Companies in China,F224
  2. Deep Continental Drilling Rig Top Drive Hydraulic System Design and Research Summary,TE922
  3. Studies on Shoot Treating and Top Grafting Technique for Replacing Cultivar of Pear Tree,S661.2
  4. Study on the Chice of Wing Spiker in Chinese Women’s Volleyball Team,G842
  5. Research and Implementation on HD and Interactive Display System Based on Digital TV,TN949.197
  6. A Study on the Effect of Company Performance and Equity Properties on Top Manager Turnover,F272.92;F224
  7. Research and Application of High Efficiency Mining Technique of Fully Mechanized Top-Coal Caving with Large Mining Height in Deep Mine,TD823
  8. The Research on Top Coal Caving for Sub-level Caving Mining in Chao Ke Wu La Coal Mine,TD823.49
  9. Study on Skyline Query Processing Techniques on Uncertain Data,TP311.13
  10. Racing Top Analysis and Countermeasure of Shenyang Science and Technology Innovation Ability,G322.7
  11. Comparative Study of Normal Strategy and Top Strategy in Full-automatic Visual Field Examination,R775
  12. The Study of Pulse Generator Power Supply System and Its Output Current Ripple Suppression,TM46
  13. History Will Not Exile,K092
  14. Research on the Influence of TMT Characteristics to Firm Performance Based on Diversification Strategy,F272
  15. The Infection Research of Transactive Memory Systeams to Team Performance in Top Management Team,F272
  16. The Core Values of Top-notch Talents of Scientific and Technological Innovation,G316
  17. Skyline query uncertain data probability algorithm,TP311.13
  18. K-anonymity data query method,TP309
  19. Research on Fault-Tolerant Parallel Skyline Query Technology in Cloud Computing Environment,TP311.13
  20. Design and Researh on TRT Control System of 1350m~3 Balst Furnace,TF325.65
  21. The Empirical Research about Pay Gap of Small and Medium-sized Enterprise and Corporate Performance,F276.3;F224

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems
© 2012 www.DissertationTopic.Net  Mobile