Dissertation > Excellent graduate degree dissertation topics show

Research on Clustering Algorithms for Source Code Mining

Author: MengMeiZhi
Tutor: ZhangYang
School: Northwest University of Science and Technology
Course: Computer Software and Theory
Keywords: Data Mining Source Code Mining Kernel Function KFCM Algorithm Genetic Algorithm
CLC: TP311.13
Type: Master's thesis
Year: 2010
Downloads: 37
Quote: 0
Read: Download Dissertation

Abstract


Source code data is characterized by massive data, many nominal attributes, and so on. In order to mine the software engineering data efficiently, we must utilize fast and efficient approach. With broad applications in many situations, K-means clustering is a concise and practical algorithm. However, it does not optimize the features of the samples and the result of the clustering is dissatisfactory and its efficiency largely depends on the distributions of samples. In order to solve the issues mentioned above, in this paper we proposed a KFCM algorithm based on TF-IDF to cluster source code data; meanwhile, we also otiptimize KFCM algorithm using genetic algorithm and devise a new algorithm called SGAKFCM. This new algorithm solves the problem of local minimum values inherited in KFCM algorithm. Finally, we utilize KFCM and SGAKFCM algorithms to mine source code data. The experimental results illustrate that the KFCM algorithm and SGAKFCM algorithm are suitable for large number of data, with high efficiency and good results.Main research issues of the paper are as follows:(1) KFCM algorithm based on TF-IDF. In this paper, because the KFCM algorithm cannot cluster text datas of sourcecode directly, we use the TF-IDF method to transform the text datas of sourcecode into numerical data, which addressed the problem of local optimization for KFCM.(2) We make use of SGAKFCM algorithm to cluster the TF-IDF format data, which solve the problem of local minimum values inherited in KFCM algorithm.We implemented KFCM and SGAKFCM algorithms based on Eclipse and Matlab platforms, and evaluated the algorithms on the source code of WEKA. Then, FCM, KFCM and SGAKFCM algorithms are used to analyze the output respectively. By comparing the results of the three clustering algorithms, we concluded that the KFCM algorithm has satisfactory clustering effects and high efficiency on software engineering data with nominal attributes.Experimental results show that KFCM algorithm based on TF-IDF can achieve satisfactory performance on source code mining. The main contributions of this paper include using TF-IDF to represent the source code data, adopting genetic algorithm to optimize KFCM algorithm to solve the problem of local minimum values inherited in KFCM algorithm.

Related Dissertations

  1. Development of the Platform for Compressor Optimization Design and Aerodynamic Optimization Design in the Transonic Compressor,TH45
  2. Modulation Pattern Recognition and Parameter Estimation for Ground Wave Radiation Sources,TN957.51
  3. Process Support Vector Machine and Its Application to Satellite Thermal Equilibrium Temperature Prediction,TP183
  4. A Study on Healthcare Product Marketing Based on Data Mining Technology,F426.72
  5. Gao Zhong-ying academic thought and experience and use of Bufei Decoction treatment of common diseases of the respiratory system drug law,R249.2
  6. Bing- thick academic thought and clinical experience and empirical studies apply to turtle soups treatment of chronic kidney disease,R249.2
  7. The Application of Fuzzy Comprehensive Evaluation Based on Genetic Algorithm in Vocational Evaluation of Classroom Teaching,G712
  8. Study on Taste Characteristic of Taste Peptide Enzymatic Production from Oyster Base on A Neural Network Method,TS254.4
  9. Design and Realization of the Magnetic Antenna in MW and SW Bands Based on Genetic Algorithm,TN820
  10. Citrus Image Segmentation Based on Genetic Algorithm,TP391.41
  11. Research of Scheduling Algorithm Based on Hybrid Adaptive Genetic Algorithm in Computing Grid,TP393.09
  12. Public Transport Optimal Dispatching Based on the Genetic-Newton Algorithm,TP18
  13. BP network optimization based on genetic algorithm optimization of the biodiesel process,TE667
  14. The Design and Implementation of Bicluster Data Analyzing Software,TP311.52
  15. The Research on Texture Synthesis Technology from Cloud Theory & Been Evolution Genetic Algorithm,TP391.41
  16. Research on Clustering Algorithm Based on Mutation Particle Swarm Optimization,TP18
  17. Research on Fuzzy C-Mean Clustering Algorithm Based on Particle Swarm Optimization and Shuffled Frog Leaping Algorithm,TP18
  18. Research on Clustering Algorithm Based on Genetic Algorithm and Rough Set Theory,TP18
  19. Based on data mining research tax audit case selection,F812.42
  20. Community-oriented education, personalized learning system and its implementation,TP391.6
  21. Association rule mining based Intrusion Detection System Research and Implementation,TP393.08

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems
© 2012 www.DissertationTopic.Net  Mobile