Dissertation > Excellent graduate degree dissertation topics show

Research of Frequent Itemsets Mining Algorithms Based on Vertical Data Format

Author: ChenShuai
Tutor: HuangGuoYan
School: Yanshan University
Course: Applied Computer Technology
Keywords: frequent itemsets frequent closed itemsets vertical data format RI-DAG rowset weight constraint
CLC: TP311.13
Type: Master's thesis
Year: 2012
Downloads: 34
Quote: 0
Read: Download Dissertation

Abstract


Frequent patterns mining is a key technology in the field of data mining.The existing algorithms mining frequent itemsets have high efficiency inmining frequent patterns. However, with the increasing refinement of theapplication fields,datasets with different characteristics appeared. And thesedatasets with different characteristics led the traditional static databasemining techniques to be invalid. Therefore, how to make highly compressedstorage and efficient mining is a research hotspot. What is more, how toadopt reasonable constraint method and mine interesting patterns efficienctlyaccording to different needs is another research hotspot. To handle theseproblems, this paper has mainly focused on the research of new algorithmsfor mining frequent patterns. These new algorithms can be used in sequenceanalysis, customers’ purschasing behavior pattern forecast and softwaresecurity analysis and so on.Firstly, an algorithm based on RIHS-Tree for mining frequent closeditemsets in high dimensional datasets is proposed. It uses vertical data formatto store data and constructs RIHS-Tree according to the transposed dataset.The tree stores the rowsets sharing a common rowid prefix, according tosome predefined order of rowids. It adopts a bottom-up search strategy to traverse the RIHS-Tree. In the process of mining, we use therowsets-inclusion strategy to implement pattern growth and then obtain largerowsets, as well as the corresponding frequent closed itemsets.Secondly, an algorithm based on directed acyclic graph for miningTop-k frequent closed itemsets is proposed. It converses the whole datasetinto a transposed table and construct directed acyclic graph according to therowsets in the transposed table. It adopts a close-checking method to generateall frequent closed itemsets.Finally, an algorithm based on vertical data format for mining weightedfrequent itemsets is presented, which can efficiently mine interestingpatterns which users are satisfied in. Vertical data format is adopted in orderto calculate the support of itemsets and it classifies the transposed dataaccording to minimum items. The classes of minimum items are obtained bycombining minimum items with other items or the combinations of theseitems. The notion of the weighted valid extension (wv) property is proposed.Based on wv and hash table which stores weighted non-frequent2-itemsets,pruning is applied to reduce the candidate itemsets. Finally, all of frequentitemsets with constraint can be mined. The algorithm used in this article is written in JAVA. Synthetic datasetsand real datasets are adopted for mining frequent itemsets in the experiments.

Related Dissertations

  1. Research of Closed Frequent Itemsets Mining Algorithm in Data Steams,TP311.13
  2. An Apriori Improved Algorithm of Data Mining Based on Graph and Implementation of a Data Mining System,TP311.13
  3. Association rules algorithm and its pharmacy system in the intelligent application of,TP311.13
  4. Seamless philosophical art world,I207.25
  5. Based on the associated technology Chinese Text Classification,TP391.1
  6. Web logs based on the closed frequent itemsets mining,TP393.092
  7. Research on Algorithms for Mining Frequent Patterns in Data Streams,TP311.13
  8. Life Trek - Feng Zhi \,I207.25
  9. Research on Algorithms for Mining Frequent Itemsets,TP311.13
  10. Transaction-based data table technology research association rule mining,TP311.13
  11. An Algorithm of Maximal Frequent Itemsets Mining Based on Dynamic Reordering,TP311.13
  12. Research and Implementation of Multi-dimensional Association Rules Based on Prefix Tree,TP311.13
  13. Research on Succinct Frequent Pattern Mining Algorithm Based on Positional Information,TP311.13
  14. Research on Algorithms for Mining Frequent Patterns in Sliding Window Over Data Streams,TP311.13
  15. Research on Data Mining Algorithms Based on Association Rules,TP311.13
  16. An Algorithm and Context Analysis of Mining Frequent Closet Itemsets,TP311.13
  17. Algorithm and its application in pharmaceutical data mining association rules,TP311.13
  18. Research on Frequent Patterns Mining Algorithm Based Sliding Window in Data Streams,TP311.131
  19. Research on Frequent Itemsets Mining Technology over Data Streams,TP311.13
  20. Association Rules Algorithm Research and Application,TP311.13
  21. Study of Path Based Reliability of the Weighted Regional Communication Network,TN915.06

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems
© 2012 www.DissertationTopic.Net  Mobile