Dissertation > Excellent graduate degree dissertation topics show
Algorithms for Data Streams Based on Shielding/Summarizing
Author: ChongZhiHong
Tutor: ZhouAoYing
School: Fudan University
Course: Computer Software and Theory
Keywords: data streams data streams mining querying data streams
CLC: TP301.6
Type: PhD thesis
Year: 2006
Downloads: 157
Quote: 1
Read: Download Dissertation
Abstract
Recently emerging data-intensive applications usually generate so called data streams in uncontrollable properties, coming order or interval for example while algorithms may consume almost infinite memory with respect to limited space available. Therefore, an algorithm for data streams is constrained to the following requirements: 1) it must run in sublinear space while its output may be approximate; 2) it can process inputs in an online way. Either shielding parts of data streams or summarizing data streams can be among alternative strategies for data streams processing when it can consume no more than sublinear space. In this thesis, we study several problems of data streams, including mining frequent item(set)s, estimating aggregation functions in distributed environments and searching k-median with the following contributions:1. Based on online shielding parts of data streams, we propose false negative algorithms for mining frequent item(set)s or maximal frequent itemsets. Using O(s-1 ln(2δ-1) memory, our algorithm can output frequent items with probability of at least 1-δ while capturing maximal frequent itemsets at the cost of O((K/s) ln (s-1δ-1)).2. Based on sampling data streams, we propose algorithms for filtering out redundant and inconsistent data hidden in distributed data streams. Our algorithms can lead to uniform samples and approximate solutions to estimating such aggregate functions as average aggregate function and k-median.3. Based on summarizing data streams, we propose a time-efficient computation of k-median under memory constraints.In addition to data-intensive applications, our researches can be applied to computational geometry, massive graph, machine learning, pattern recognition and etc.
|
Related Dissertations
- An Algorithm on Clustering and Anomaly Detection for Multiple Data Streams,TP311.13
- The Research on the Related Problems of Association Rule Mining Over Data Streams,TP311.13
- Mining Probability Frequent Patterns to Recover Uncertain RFID Data Stream,TP391.44
- Research on Clustering Algorithm Based on Subspace in High-Dimensional Data Streams,TP311.13
- Research of Optimized Clustering Algorithms over Data Streams,TP311.13
- Research on Density-Based Clustering Algorithm of Data Streams in Intrusion Detection,TP393.08
- Online network intrusion detection research based on the data stream classification,TP393.08
- Research on Ensemble Classifier of Datastream Based on UFFT,TP311.13
- Data Preprocessing and Feature Analysis for Real-time Data Stream,TP311.13
- Research on top k Frequent Closed Pattern Mining in Data Streams,TP311.13
- Generation and Formal Verification of Radio Block Center Data Streams Based on STATEMATE,U284.48
- The Research on Clustering Algorithm of Data Stream,TP311.13
- Based on PCA / ICA associate multiple data streams and pattern discovery,TP311.13
- The Research of Aggregate Algorithm over Data Streams,TP311.13
- Mining Clusters in Data Streams,TP311.13
- The Application and Research of Incremental Clustering on Temporal Data Streams,TP311.13
- Network Traffic Burst Detection Based on Data Streams,TP393.06
- Research on Uncertain Data Stream Database System,TP311.13
- Application of Image Enhancement and Image Restoration Technique in X Radiographic Image Processing,R318
- Beijing Netcom trouble ticket management system software design,TP311.52
- Research on Classification Technologies in Mining Unsteady Data Streams,TP311.13
CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > General issues > Theories, methods > Algorithm Theory
© 2012 www.DissertationTopic.Net Mobile
|