Dissertation > Excellent graduate degree dissertation topics show

Research on Grid-based MST Data Stream Clustering Algorithm

Author: WangXianPeng
Tutor: ZhangJianPei
School: Harbin Engineering University
Course: Computer System Architecture
Keywords: Cluster analysis Data Flow Grid Minimum Spanning Tree
CLC: TP311.13
Type: Master's thesis
Year: 2009
Downloads: 132
Quote: 4
Read: Download Dissertation

Abstract


Cluster analysis is a field of data mining is a very important research direction. In recent years , with the rapid development of information technology, the emergence of a dynamic stream of increasingly widespread application of the data - data stream. Unlike traditional data stream is stored in the static data on the disk , it is the high-speed , continuous, dynamic, fast-changing , massive data sets , thus access to it can only be sequential, once or finite the . These characteristics of both data streams to the data stream mining has brought great difficulties, but also to the data stream clustering algorithm put forward higher requirements. In the current field of data mining , data flow has become a hot research topic , while the data stream clustering clustering analysis has become an important research direction . This paper describes the data stream mining related theory and technology , combined with traditional streaming data static data analyzes data streams of different characteristics . While the traditional clustering algorithms and data stream clustering algorithm is studied and compared to analyze the advantages and disadvantages of the algorithm , described the characteristics of the data stream clustering algorithm and its traditional clustering algorithms are different. Then introduced meshing algorithm for clustering method and its role in clustering analysis , and grid-based clustering algorithm is studied and analyzed . Based on this paper, a new data stream clustering algorithm -GTSClu algorithm, which is based on the minimum spanning tree grid (MST) data stream clustering algorithm is divided into online and offline clustering two parts deal and use of grid technology with the minimum spanning tree . Online part through uniform meshing data space for data stream, offline part will be split into non-uniform grid spatial grid structure , and use the minimum spanning tree technique to obtain information on the online cluster . GTSClu algorithm can effectively eliminate noise data found clusters of arbitrary shape , effectively improve the efficiency and quality of clustering . Experimental results show that , GTSClu algorithm can find clusters of arbitrary shape , the input order of data is not sensitive , and the grid resolution techniques can be effectively used to separate the noise data with high accuracy and processing efficiency of the cluster , for handle large streams of data.

Related Dissertations

  1. Grid-Side Converter Control and Wind Turbine Emulator in Direct Drive Wind Power System,TM46
  2. BioLab a Bioinformatics Oriented Grid Portal,TP399-C8
  3. Design and Realize of Family Cleaning Robot Path-Coverage System,TP242
  4. Development of EST-SSR Primers and Application in Analysis of Genetic Realtionships in Tree Peony,S685.11
  5. Pre-hypertension syndrome characteristics,R259
  6. Comprehensive Quality Assessment of College Students,G645.5
  7. Studieson Effects of Soybean Species on Yuba and Initial Establishment of Quality Evalution System for Yuba,TS214.2
  8. The Grid-Connected Wind-solar Hybrid Generation System and Maximum Power Point Tracking,TM61
  9. ISSR Analysis of Genetic Diversity on 21 Lotus(Nelumbo Nucifera) Cultivars,S682.32
  10. Research of Scheduling Algorithm Based on Hybrid Adaptive Genetic Algorithm in Computing Grid,TP393.09
  11. Research on the Soil Environmental Function Zoning,X321
  12. Comparison of Gene Expression Data Cluster Methods and Gene Network Construction for Phytophthora Sojae Genes,S435.651
  13. Study on Heterosis and Genetic Basis of Soybean,S565.1
  14. Evaluation on Forage Quality and Biomass Energy Characters of Inbred Vegetative Lines of Napier Grass,S543.9
  15. The Establishment of Grid Platform on Agricultural Supply Chain System,S126
  16. Clustering Method Research Based on Divided and Conquered Method,TP311.13
  17. Elements of Thirty Kinds of Proprietary Chinese Medicines and Classification Based on Elements,R286.0
  18. Community-oriented education, personalized learning system and its implementation,TP391.6
  19. The implication structural study interval set,O159
  20. SAR interferometric method for optimal selection,P225.2
  21. Design to E-learning System in Senior Vocational School Base on Moodle,TP311.52

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems
© 2012 www.DissertationTopic.Net  Mobile