Dissertation > Excellent graduate degree dissertation topics show
Research of High-Dimensional Space Join and Query Algorithms Based on Main-Memory
Author: LiuYan
Tutor: HaoZhongXiao
School: Harbin University of Science and Technology
Course: Applied Computer Technology
Keywords: main-memory high-dimensional join similarity join k nearestneighbor query k nearest neighbor join reverse k nearest neighbor join
CLC: TP311.13
Type: PhD thesis
Year: 2011
Downloads: 3
Quote: 0
Read: Download Dissertation
Abstract
Many emerging database applications such as CAD, multimedia, medical image, time series, molecular biology and scientific databases represent their data as multi/high-dimensional feature vectors. Each feature vector consists of d values which can be interpreted as coordinates in a d-dimensional space plus some associated content data. The distance between two feature vectors is commonly used to measure the degree of (dis-)similarity between the original high-dimensional objects (with regard to the feature represented). There is increasing need to combine similar tuples (points) of two datasets for many applications.The operation of generating all pairs is in essence high-dimensional join (high-dimensional join is a general designation of all kinds of similarity join).So far most of researches focus on the execution of high-dimensional joins over large amounts of disk based data. The increasing sizes and lower price of main memory available on current computers, and the need for efficient processing of high-dimensional joins suggest that high-dimensional joins for a large class of problems can be processed in main memory. Δ-tree is a novel multi-level index structure, it can speed up the high-dimensional query in main memory environment and has been proven to be an efficient index method. Therefore, in order to meet different application needs, several kinds of high-dimensional joins and queries were designed based on index structure Δ-tree. The main work of this topic can be summarized as follows:Firstly, the efficient main-memory index Δ-tree and the characteristics of similarity join were studied, the pruning strategies of the nodes in Δ-tree were analyzed. Made full use of the properties of Δ-tree, A novel main-memory similarity self-join algorithm Δ-tree-Join was presented for a single dataset. Used the properties of cluster overlap and taked the center of nodes in Δ-tree as the center of clusters, main-memory index structure Δ-tree-S was designed.A main- memory similarity join algorithm called Δ-tree-Join*adopted the top-down join scheme for combining two different database sets was presented based on A-tree-S. The two algorithms computed the distances between clusters and between point and cluster with fewer number of dimensions, so as to filter unnecessary nodes or points, reduce computations and improve main-memory join efficiency fundamentally. Experimental results indicate that these two join algorithms have high join efficiency.Secondly, the feature that the searching radius of kNN query is unknown apriori was analyzed, the algorithm of high-dimensional kNN(k Nearest Neighbor) query in main-memory was studied. Three kinds of main-memory kNN query algorithms based on A-tree for high dimensional data were developed. Preformace analysis and experiments prove that these three algorithms improve query efficiency compare to existing algorithm, among them the third query algorithm named BU_DF_knn_Search has the best efficiency. This algorithm determines the membership leaf node of query point in A-tree through the definition of membership node firstly, and searches kNN in this node, narrows the pruning distance quickly using near optimal kNN candidates, then traverses the subtrees used the ancestors of membership leaf nodes as root nodes by depth first manner from bottom to top.Thirdly, the properties of kNN join was analyzed, the algorithm of kNN join in main-memory was studied. According to requirements index structure A-tree-R for main-memory kNN join was proposed, every generated node in this index was coded. A main-memory kNN join algorithm was presented based on Δ-tree-R and Δ-tree-S, this algorithm decodes in Δ-tree-S by using the code of A-tree-R, fast locates positional correspondence nodes, uses the near optimal kNN candidates in them and adopts the traversal order of combining bottom up and depth first search, narrows kNN pruning distances rapidly. Experiments results illustrate that Δ-tree-KNN-Join is an efficient kNN-join method in main memory.Fourthly, the solutions of RkNN (Reverse k Nearest Neighbor) query and the characteristics of high-dimensional RkNN query were analyzed, the conclusion that choose self-pruning approaches to process high-dimensional RkNN query was drawed. An index structure called Δ-Rdknn-tree was proposed based on Δ-tree-R, precomputed kNN distances of points in the dataset by main-memory kNN self-join algorithm Δ-tree-KNN-Join based on this index and propagated these distances to higher level index nodes. Main-memory RkNN query algorithm based on this index was proposed.Fifthly, the problem of set-oriented RkNN queries in main-memory enviroment namely main-memory RkNN join was studied. Constructed indexes Δ-Rdknn-tree and Δ-tree for the two datasets that were going to join, used the RkNN searching distance information of nodes in Δ-Rdknn-tree,main-memory RkNN join algorithm was designed based on the similarity join algorithm Δ-tree-Join*, performance analysis was conducted for the effectiveness of this algorithm.All the main-memory query and join algorithms in this paper were certificated by theories and experiments, have practical value and lay the theoretical and practical foundation of the research for all kinds of high-dimensional queries and joins algorithms in main memory.
|
Related Dissertations
- Research on Similarity Query over Sequence Data,TP311.13
- Data Provenance Management and Similarity Query Over Uncertain Data,TP311.13
- LSH-based Similarity Search of Web Data,TP311.13
- Research on String Similarity Join Algorithm,TP311.13
- Efficient Methods for Reverse K-Nearest Neighbor Query on the Multidimensional Objects,TP311.13
- The Research and Implementation of the Anomaly Detection in Cloud,TP393.08
- Research on Three Dimensional Reconstruction Algorithm of Denture Model,TP391.72
- Research on String Edit Similarity Join,TP391.1
- Research on Similarity Join Processing Based on Entity,TP311.13
- Spatio-temporal Database Based on the Grid Index Reverse Nearest Neighbor Query Technology,TP311.13
- Research on F&B Index Structure Supporting XML Query,TP311.13
- Query Processing and Optimization in Massive Multi-Database Integration,TP311.13
- Research on Compression, Operation and Query Processing Methods of Massive Datasets,TP311.13
- Research on Parallel Frequent Graph Pattern Mining,TP311.13
- The Design and Implement of Mediator and Wrapper Mechanism in Massive Multi-Database Intergration,TP311.13
- Research and Implementation of Mining Implicit User Interest,TP311.13
- Implementation of Data Compression, Operation and Query Processing System Based on BAP,TP311.13
- The Design and Implementation of DICOM Middle Software and Access Control Model in Formation Integration Platform,TP311.13
- Research and Improvement on K-Means Clustering Algorithm,TP311.13
- Study of Data Reduction Technique Based on Manifold Learning,TP311.13
- Research on K-means Optimization Clustering Algorithm,TP311.13
CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems
© 2012 www.DissertationTopic.Net Mobile
|