Dissertation > Excellent graduate degree dissertation topics show

Research and Implementation of Frequent Pattern Mining Algorithms over Data Streams

Author: ChangLong
Tutor: WeiDa
School: Jilin University
Course: Software Engineering
Keywords: data streams mining sliding window synopsis datastructure bitwise operation
CLC: TP311.13
Type: Master's thesis
Year: 2013
Downloads: 65
Quote: 0
Read: Download Dissertation


With the large-scale application of database management system,many fields have accumulated mass data. The requirement of how toeffectively use these data promotes the appearance and rapiddevelopment of Data Mining technology. However, along with the rise ofsome fields, such as computer network monitoring, weather monitoring,financial quotations and sensor network, a new data processing modelwhich is called data streams is proposed. The data in this data processingmodel continuously arrive at a high speed, and the mining algorithm candeal with the data only once, so the design of frequent itemsets miningalgorithm in the environment of data streams is a challenging task. The limitlessness and high-speed of data streams determine themining algorithm which must be approximate. Specific to the problem ofhow to design a effective synopsis data structure, this thesis analyzes theclassical frequent pattern tree, FP-tree structure, combines the improvedsliding window model which updates with basic window, designs atwo-dimensional array structure to store the frequent itemsets, andproposes the special assignment method of this two-dimensional array.Based on the proposed two-dimensional array structure, this thesisdesigns a frequent itemsets mining algorithm, MFIBA(Mining FrequentItemsets based on Bitwise AND) algorithm, which moves towards datastreams. This algorithm is a batch processing program, first of all, it setsup a fixed-size sliding window in the arriving data streams, then dividesthe sliding window into some basic windows with the same width, andfinally utilizes the two-dimensional array structure to store the frequent itemsets information of every basic window. When there are user requests,the mining algorithm generates all the frequent itemsets by implementingbitwise AND operation between every two lines of the array, andcalculates the support counts of every frequent itemsets according to thevalue stored in the array. As an approximate mining algorithm, MFIBAalgorithm introduces a permitted error parameter, so it can effectivelydelete the infrequent itemsets, and save the memory resources.By comparing the performance of MFIBA algorithm with the Apriorialgorithm, which is a classical algorithm of data mining, it is obvious thatthe algorithm proposed in this thesis is better than Apriori algorithminterms of running time and memory consumption, so it is applicable tothe data streams mining.

Related Dissertations

  1. Research on Data Stream Clustering Algorithm Based on Density Grid over Sliding Window,TP311.13
  2. Research on Web Clickstreams Data Clustering Technology,TP311.13
  3. Outlier Detection Technic on Probilistic Stream,TP311.13
  4. Research and Design on the Quality Control Device for the Distribution Systems,TM76
  5. Design and Implementation on Signal Processing Algorithm for Power Quality Monitoring System,TN911.7
  6. P4P-based System for Streaming Media On-demand Research and Implementation,TN948.64
  7. Research and Implementation of the DMA Controller and the Memory System 2D Extension in YHFT-Matrix DSP,TP368.1
  8. Research on Fast Implementation of RSA Algorithm and Its Improvement,TN918.1
  9. Research on RSA and Elliptic Curve Cryptographic Algorithms,TN918.1
  10. Research for Predictive Aggregate Queries Processing Over Data Streams Based on the Sliding Window,TP311.13
  11. Symbolic Dynamics Analysis and Its Application in EEG Signals Processing,R318.0
  12. Estimating Sliding Window-Based Aggregation Queries over Probabilistic Data Streams,TP311.13
  13. Secure messaging technology within a network monitoring system,TP311.52
  14. The abnormality detection of the high latitude data stream,TP311.13
  15. Blurred Image Restoration Algorithms Based on Neural Network,TP183
  16. Research on Algorithms for Mining Frequent Patterns in Sliding Window Over Data Streams,TP311.13
  17. Learning-Based Processing of Top-N Queries on Data Streams,TP311.13
  18. Research on Succinct Frequent Pattern Mining Algorithm Based on Positional Information,TP311.13
  19. Research on Algorithm for Mining Frequent Closed Itemsets Over Data Streams,TP311.13
  20. Research of Behavior Trust Model in Cloud Computing Environment,TP309
  21. Implementation of Cleaning Techniques for RFID Data Streams,TP391.44

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems
© 2012 www.DissertationTopic.Net  Mobile