Dissertation > Excellent graduate degree dissertation topics show

Study on Frequent Pattern Mining Algorithms and Pruning Strategies

Author: XuYuSheng
Tutor: LiLian
School: Lanzhou University
Course: Basic mathematics
Keywords: data mining association rule analysis frequent patterns mining frequent sequences mining frequent itemsets mining enumeration space pruning strategy maximal frequent patterns mining frequent closed patterns mining
CLC: TP311.13
Type: PhD thesis
Year: 2008
Downloads: 459
Quote: 9
Read: Download Dissertation

Abstract


Frequent patterns mining is a fundamental and essential problem in data mining and can be used in many data mining applications such as association rules analysis, correlations analysis, outlier analysis, classification, clustering, etc. Frequent patterns mining is an important research topic in both theory and application study. In this thesis, we make a study on frequent patterns mining problem in depth. The main research contents and contributions include the following.Firstly, frequent sequences mining problem is studied. Previous mining algorithms are analyzed, which include GSP, SPADE, SPAM and PrefixSpan. Based on this analysis, a novel frequent sequences mining algorithm - FINDER (Frequent Itemset—based exteNsion canDidate genERation) - is proposed. FINDER searches the enumeration space in a DFS way. Complex hash and multi-database scanning are replaced by bitmap representation of itemset, database and sequence in FINDER. FINDER generates a candidate by extending current frequent sequence with a frequent itemset. Thus most of in-frequent extensions are not attempted. Experiments on synthetic datasets show that the performance of FINDER is almost as same as that of SPAM, which is to our best of knowledge the fastest algorithm. FINDER outperforms other algorithms by a factor of 3 to 5.Continually, FINDER is extended with lattice theory, and a parallel algorithm - pFINDER - is gotten. According to lattice theory, the search space is divided into several non-intersect parts in pFINDER, each of which can be enumerated independently. Thus, no remote communication is needed, and pFINDER can scale-up.Additionally, FINDER is improved to mining weighted frequent patterns and iFINDER is gotten. With item renaming, iFINDER changes weighted frequent patterns mining problem into frequent patterns mining. It can be used for interactive mining applications.Frequent patterns mining is an I/O-intensive and computing-intensive task. Pruning strategy is an effective method to improve performance of mining algorithms. This thesis presents two novel pruning strategies, denoted as SEP (sequence extension pruning) and IEP (itemset extension pruning). The correctness of SEP and IEP strategies is confirmed by reasoning and experiment. Both of SEP and IEP can be applied in maximal frequent patterns mining, closed frequent patterns mining and frequent patterns mining.Lastly, SEP and IEP strategies are applied to improve important frequent patterns mining algorithms, such as SPAM, SPADE, MAFIA, CHARM, etc. All the improved frequent sequence mining algorithms (SPAM+ and SPADE+) and frequent itemset mining algorithms (MAFIA+ and CHARM+) outperform previous ones by a factor of up to 10 on synthetic datasets. On large datasets, the performance is improved by 30%-50%.SEP and IEP can be used in different kinds of frequent patterns mining algorithms and improve the performance. Thus, SEP and IEP are independent of underlying algorithms and data structures. SEP and IEP can be shared by different algorithms.

Related Dissertations

  1. A Study on Healthcare Product Marketing Based on Data Mining Technology,F426.72
  2. Gao Zhong-ying academic thought and experience and use of Bufei Decoction treatment of common diseases of the respiratory system drug law,R249.2
  3. Bing- thick academic thought and clinical experience and empirical studies apply to turtle soups treatment of chronic kidney disease,R249.2
  4. The Design and Implementation of Bicluster Data Analyzing Software,TP311.52
  5. Research on Clustering Algorithm Based on Mutation Particle Swarm Optimization,TP18
  6. Research on Fuzzy C-Mean Clustering Algorithm Based on Particle Swarm Optimization and Shuffled Frog Leaping Algorithm,TP18
  7. Research on Clustering Algorithm Based on Genetic Algorithm and Rough Set Theory,TP18
  8. Based on data mining research tax audit case selection,F812.42
  9. Community-oriented education, personalized learning system and its implementation,TP391.6
  10. Association rule mining based Intrusion Detection System Research and Implementation,TP393.08
  11. Data warehouse technology in the banking customer management systems research and implementation,TP315
  12. Design to E-learning System in Senior Vocational School Base on Moodle,TP311.52
  13. Design and Development of Teaching Quality Assessment System Based on Data Mining,TP311.13
  14. The Application of Association Rules Algorithm in Higher Vocational Colleges’ Endorsement of Impoverished Students,G717
  15. Based on Data Mining Technologies in Urban Water Supply Analysis and Decision,F299.24;F224
  16. Research on Application of Data Mining Technology in Degree of Satisfaction Analysis of Television Customers,TP311.13
  17. Web Usage Mining and the Research of Personalized Recommendation,TP311.13
  18. Data Mining of Application in the School Management and Training Students,TP311.13
  19. Research on Employment Monitoring System of University Graduate,G647.38
  20. Design and Implementation for Decision Support System of Drug Administration Based on Data Warehouse,TP311.13
  21. A Research on the Credit Card Client Activating and Response Extent Based on Data Mining,F832.2

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems
© 2012 www.DissertationTopic.Net  Mobile