Dissertation > Excellent graduate degree dissertation topics show

Clustering Method Research Based on Divided and Conquered Method

Author: JiaJunFang
Tutor: LiDeYu
School: Shanxi University
Course: Systems Engineering
Keywords: Cluster analysis Divided and conquered method Categorical data Dissimilarity measure Evaluation index
CLC: TP311.13
Type: Master's thesis
Year: 2011
Downloads: 11
Quote: 0
Read: Download Dissertation

Abstract


In data mining, clustering analysis is an important research field, whose studies on the content, methods and tools are widely used in real life. Such as financial fraud, medical diagnosis, image processing, information retrieval and biological sciences. In recent years, clustering algorithms has become a very popular field of study and achieved fruitful results. However, as the continuous development of science and technology, together with constantly expanding of the size of the data, there has been categorical data and mixed data, and the studies is not just limited to numerical data. The two kinds of new coming data, with their high-dimension and large numbers, have a sparse data distribution, and more noise data, when the dimension is very high, there may also be a "from becoming zero phenomena", that is the distance between the points farthest away and recent from the given data decreases gradually with the dimension increasing. The clustering algorithm of numerical data cannot be easily applied to categorical data with its lack of inherent geometric model. Therefore, clustering algorithm of categorical data has been a very important research, and has attracted wide attention.This paper, under the clustering algorithm framework of fuzzy K-Means and fuzzy K-Modes, introduces divide and conquer to make studies on clustering algorithm of large data sets and categorical data. Research results are as follows:(1) The clustering method for large scale data set based on divide and conquer is to divided the data sets into several subsets, and simultaneously cluster each subset, then merge cluster results of each subset, finally coming the last clustering results. This method overcomes the weakness of "from becoming zero phenomena", which may be created by abundant data and high dimensions of large scale data. In addition, the complexity of clustering is reduced due to the decomposition of large-scale data for small-scale data. This method is carried out on the artificial data sets and the experimental results show that the clustering method for large data sets based on divide and conquers is effective.(2) The clustering method for categorical data sets based on divide and conquer is to method apply divide and conquer to fuzzy K-Modes clustering algorithm, divide large and complex data sets into several smaller subsets and cluster them. And then, concretize the clustering results of subsets to obtain the final clustering results. This method overcomes the lack of geometric model brought by categorical data with simple 0-1 match similarity measure and avoid "from becoming zero phenomena", from becoming zero phenomena, caused by large-scale data sets. This method make a comparison in UCI data sets with traditional clustering algorithm of fuzzy K-Means and fuzzy K-Modes, and the experimental results show that the clustering method for categorical data sets based on divide and conquer is effective.Paper proposes clustering algorithm based on divide and conquer and also proves the effectiveness of the algorithm in the UCI data sets.

Related Dissertations

  1. Research on Index System and Evaluation of University Office Greening,G647
  2. Pre-hypertension syndrome characteristics,R259
  3. Application of Improved Principal Component Analysis Algorithm in Course Construction,G642.4
  4. The Application of Fuzzy Comprehensive Evaluation Based on Genetic Algorithm in Vocational Evaluation of Classroom Teaching,G712
  5. Evaluation Index System Structure of University Students’ Choreographing Capability of Physical Education Aerobics Majors,G831.3
  6. Comprehensive Quality Assessment of College Students,G645.5
  7. Studieson Effects of Soybean Species on Yuba and Initial Establishment of Quality Evalution System for Yuba,TS214.2
  8. The Research on Evaluation Method of Highway Ecosystem Healthy,X826
  9. Study on Evaluation Index System for Land Ecological Security,X826
  10. ISSR Analysis of Genetic Diversity on 21 Lotus(Nelumbo Nucifera) Cultivars,S682.32
  11. Research on the Evaluation Index System of Undergraduate Teaching Team Building Level,G647
  12. Research on the Soil Environmental Function Zoning,X321
  13. In Jiangsu Province a Dominance Evaluation of Cropping Pattern Bassed on the Comparative Advantage Theory,S344
  14. Comparison of Gene Expression Data Cluster Methods and Gene Network Construction for Phytophthora Sojae Genes,S435.651
  15. Evaluation on Forage Quality and Biomass Energy Characters of Inbred Vegetative Lines of Napier Grass,S543.9
  16. Study on Teaching Quality Evaluating Index System of Academic Course,G642.4
  17. Research on Clustering Algorithm Based on Genetic Algorithm and Rough Set Theory,TP18
  18. Research on the Evaluation Index System of Internationalization of Postgraduate Education,G643
  19. Comprehensive Evaluation of Safety Production and Optimization of Safety for Yangcheng Coal Mine,X936
  20. Community-oriented education, personalized learning system and its implementation,TP391.6

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems
© 2012 www.DissertationTopic.Net  Mobile