Dissertation > Excellent graduate degree dissertation topics show

Application of Biclustering Integrated MOPSO and Greedy Randomized Construction to Gene Expression Data

Author: LaiLiNa
Tutor: LiuGuiXia
School: Jilin University
Course: Applied Computer Technology
Keywords: Gene expression data Biclustering Multi-objective particle swarm optimization Greedy Randomized Construction
CLC: TP311.13
Type: Master's thesis
Year: 2012
Downloads: 95
Quote: 0
Read: Download Dissertation


Due to the advent of High-throughput microarray technology, it generated large amountsof gene expression datasets. Microarray technology allows simultaneously to measure theexpression level of thousands of genes under different experimental conditions. Themicroarray datasets are generally presented as a n mdata matrix, where rows correspondto genes and columns correspond to experimental conditions. Mining local patterns whichshow similar expression patterns under subsets of conditions from those microarray datasets isan important research problem in bioinformatics and medical domain.Clustering technology has been widely used in gene expression profiles anaysis, and itcan identify sets of genes which has similar expression patterns under all expressionconditions. But in the cellular processes, it is knowledge that subsets of genes wereco-regulated and co-expression only under subsets of condtions. But clustering can notidentify such subsets of gene. Biclustering algorithms which were originally described byHartigan can perform simultaneous clustering of both rows and columns of the geneexpression matrix. Afterwards, biclustering was introduced to gene expression analysis byCheng and Church. It is a very useful technology to identify local patterns in which sets ofgenes shares similar expression under subsets of conditions but all of conditions. So far, manybiclustering algorithms have been developed for gene expression data analysis, such asbiclustering, SAMBA, Gibbs sampling biclustering, spectral biclustering, and so on.Most of these methods use local search to generate optimal biclusters. In recent years, manyevolution algorithms (EA) have been proposed to indentify optimal bicluster in microarraydata in order to escape from local minima. And, through improving the evolution algorithms,EA is able to solve multi-objective optimization problem. So, more and more researchers areinterested in evolution algorithms based heutistics optimization methods.When searching optimal biclusters from gene expression data, we need to optimizesimultaneously several objectives, which are in conflict with each other. These severalobjectives are the size, mean squared score and row variance of the bicluster. In this situation,we can appy multi-objective evolutionary algorithms (MOEAs) to discover efficiently optimalbiclusters. MOEAs mainly proposed three relaxed forms of Pareto dominance, such as ε-dominance, α-dominanceb b-dominance. Applying these relaxed forms of Paretodomincace can regulate convergence of MOEA, and improve diversity of all particles in theexternal archive. Amonge these forms, is widely applied in the MOEAs, dueto its effectiveness and its theoretical foundation. Particle swarm optimization (PSO) firstlyintruduced by Kebnnedy and Eberhart is inspired by the behavior of a bird flock finding food.PSO has been found to be quite efficient in solving the single objective optimization problem.Because of rapid convergence and efficiency, researchers are interested in applying PSO tohandle multi-objective optimization problem. This methods is named as multi-objectiveparticle swarm optimization (MOPSO).In this paper, we propose a new inporoved algorithm, GR-MOPSO algorithm. Thismethod incorporate, crowding distance and time varying parametersapproach into MOPSO biclustering framework. Firstly, we apply greedy randomizedconstruction to generate bicluster seeds, then use these bicluster seeds to initialize all particlesin the particle swarm. Secondly, PSO has three vital parameters which are inertia w andacceleration coefficientsc1,c2. In order to improve the convergence of PSO, we introducetime varying parameters into MOPSO. These parameters are changing with the iterations,which can enhance global exploration at early stage and raise local exploitation at later stage.Lastly, with the purpose of controlling the size of external archive and maintaining the gooddiversity of particles in the archive, we appy the crowding distance to truncate archivepopulation. We respectively implement this new improved aglorithm GR-MOPSO on theyeast and human B-cells datasets, and compare the results with MOEA, MOPSOB. Theresults show: the quality of biclusters GR-MOPSO found are better than the results of MOEAand MOPSOB. And GR-MOPSO have well population diversity and rapid convergence.The innovation of this paper is mainly the following two aspects:①Through analyzingthe mining bicluster, we found the features of bicluster conflicting with each other.So,we canapply multi-objective optimization theory into the biclustering. Incorporated,crowding distance and time varying parameters approach into MOPSO framework,GR-MOPSO can search more optimal biclusters when application of the improved MOPSO tothe bicluster seeds.And,it can maintain diversity of particles in the particle swarm andenhance rapid convergence.②In the initial stage of multi-objective particle swarmoptimization, we use bicluster seeds to initialize the position and velocity of all particles in theswarm, not use random value. The bicluster seeds are the results of Greedy RandomizedConstruction. So, it can enable all particles to have the character of bicluster and enhance efficiency to search optimal solutions.The future work: Firstly, in the biclustering of largely gene expression data,thresholddetermines the number and quatity of found biclusters. So, we need to study how to selectsuitable threshold in order to improve quatity of biclusters and run efficiently.Secondly, due toPSO without the filting process such as crossover and mutation, we need to improve theparticle swarm for enhancing the diversity of the particles in the swarm and increasing rate ofconvergence. Lastly, we further rearch the biological significance of the biclustering results.

Related Dissertations

  1. Comparison of Gene Expression Data Cluster Methods and Gene Network Construction for Phytophthora Sojae Genes,S435.651
  2. The Design and Implementation of Bicluster Data Analyzing Software,TP311.52
  3. Dimensionality Reduction Methods for Gene Expression Data Base on SVM,TP181
  4. Mining Coregulated Biclusters from Time-series Gene Expression Data,TP311.13
  5. Research on the Methods of Power Plant Load Dispatch,TM714
  6. The Design and Analysis of Clustering Algorithms on Gene Expression Data,TP301.6
  7. The Research on Imputation Algorithm of Missing Values for Gene Expression Data,TP301.6
  8. SVM Based Research on Feature Selection Method for Gene Expression Data,TP181
  9. Gene expression data mining method,Q75
  10. The Research on Chaotic Neural Networks for Price Forcasting and Economic Load Dispatching for Combined Power and Heat System,TP183
  11. Research on Multi-Objective Optimization Particle Swarm Algorithm and Its Application,TP301.6
  12. The Research on Particle Swarm Optimization Algorithm to Solve Multi-Objective Optimization Problem,TP301.6
  13. Cancer Classification Methods Based on Gene Expression Data,R730.2
  14. Study on Some Data Mining Methods for Gene Expression Data,TP311.13
  15. Parallel Processing in Bioinformatics,Q811.4
  16. Microarray expression data standardization Strategy,Q78
  17. The Construction of Gene Regulation Network by Mathematical Models,O29
  18. Biclustering Analysis on Loss of Heterozygosity Data of Lung Carcinomas Samples,TP301.6
  19. Research on Classification Algorithms in Gene Expression Data Analyzing,TP311.13
  20. Analysis, Integration and Visualization of Gene Expression Data,Q78
  21. Research on Directional Clustering and It’s Applications,TP18

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems
© 2012 www.DissertationTopic.Net  Mobile