Dissertation > Excellent graduate degree dissertation topics show

The Design and Implementation of Bicluster Data Analyzing Software

Author: HuangAnJie
Tutor: FengJianLin
School: Sun Yat-sen University
Course: Software Engineering
Keywords: data mining biclustering algorithm gene expression data index technology
CLC: TP311.52
Type: Master's thesis
Year: 2011
Downloads: 11
Quote: 0
Read: Download Dissertation

Abstract


DNA microarray technology has now made it possible to produce gene expression data of thousands of genes under multiple conditions. More and more people are concerning about the method by computer to process the data and find the inherent correlation in the data.Traditional clustering algorithms usually can only cluster the data in one dimension, which makes it incapable to discover many coherent relationships in the gene expression data. In recent years, more and more people start to study the biclustering algorithms, which cluster the data simultaneously in two dimensions: the gene and the condition dimension, to find the coherent subspace in the microarray data, such subspace is also known as bicluster.We design and implement a bicluster data analyzing software, which can be used to discover the coheren subspace of the data, especially the gene expression data. Most importantly, this software can handle massive biclusters produced by the RAP and ET-Bicluster algorithms and provise a quick searching function for biclusters.This paper summarizes some typical biclustering algorithms, especially the RAP and the ET-Bicluster algorithms. Since RAP algorithm can directly generate biclusters using real-valued data and enables exhaustive discovery of coherent bicluster; the ET-Bicluster algorithm can deal with noisy data too. So we provide implementation of these two algorithms to analyze the gene expression data. Meanwhile, as many users may only want to find the biclusters including specified genes and experimental conditions, we make some modification in the algorithm to make it faster when computing the kind of biclustering. RAP and ET-Bicluster algorithm can exhaustively discover biclusters in the data, which also led to produce a large quantity of biclusters. In order to make the management of large number of biclusters easier, and searching for the biclusters including given genes and conditions faster, we studied some index technologies. We focus on the indexing technical methods, and manage to build a bitmap index and a prefix-tree index on the biclusters obtained by biclustering algorithms. For the situation that the index is too large to be read into memory completely, we make it possible to read the index data on demand. Finally, we study the approach of compressing the index, and reduce the size of the index as much as possible to reduce the extra storage space, and also speed up the index file access.Finally, we study one of the most widely used gene product database, the Gene Ontology database and achieve to do the function enrichment analysis among the bicluster, which is to compute the P-value of a bicluster, and implement both Bonferroni and FDR multiple hypothesis testing correction method.

Related Dissertations

  1. A Study on Healthcare Product Marketing Based on Data Mining Technology,F426.72
  2. Gao Zhong-ying academic thought and experience and use of Bufei Decoction treatment of common diseases of the respiratory system drug law,R249.2
  3. Bing- thick academic thought and clinical experience and empirical studies apply to turtle soups treatment of chronic kidney disease,R249.2
  4. Comparison of Gene Expression Data Cluster Methods and Gene Network Construction for Phytophthora Sojae Genes,S435.651
  5. Research on Clustering Algorithm Based on Mutation Particle Swarm Optimization,TP18
  6. Research on Fuzzy C-Mean Clustering Algorithm Based on Particle Swarm Optimization and Shuffled Frog Leaping Algorithm,TP18
  7. Research on Clustering Algorithm Based on Genetic Algorithm and Rough Set Theory,TP18
  8. Based on data mining research tax audit case selection,F812.42
  9. Community-oriented education, personalized learning system and its implementation,TP391.6
  10. Association rule mining based Intrusion Detection System Research and Implementation,TP393.08
  11. Data warehouse technology in the banking customer management systems research and implementation,TP315
  12. Design to E-learning System in Senior Vocational School Base on Moodle,TP311.52
  13. Design and Development of Teaching Quality Assessment System Based on Data Mining,TP311.13
  14. The Application of Association Rules Algorithm in Higher Vocational Colleges’ Endorsement of Impoverished Students,G717
  15. Based on Data Mining Technologies in Urban Water Supply Analysis and Decision,F299.24;F224
  16. Research on Application of Data Mining Technology in Degree of Satisfaction Analysis of Television Customers,TP311.13
  17. Web Usage Mining and the Research of Personalized Recommendation,TP311.13
  18. Data Mining of Application in the School Management and Training Students,TP311.13
  19. Research on Employment Monitoring System of University Graduate,G647.38
  20. Design and Implementation for Decision Support System of Drug Administration Based on Data Warehouse,TP311.13

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Software Engineering > Software Development
© 2012 www.DissertationTopic.Net  Mobile