Dissertation > Excellent graduate degree dissertation topics show

An Approach for Identifying a Plant Resistance Gene Based on the Random Forest

Author: GuoYingZuo
Tutor: GuoMaoZu
School: Harbin Institute of Technology
Course: Computer Science and Technology
Keywords: Resistance gene Feature extraction Under-sampling Random forest
CLC: Q943
Type: Master's thesis
Year: 2011
Downloads: 33
Quote: 0
Read: Download Dissertation


The research towards Plant Resistance-Gene develops as one of the most important topics in bioinformatics. Since the first resistance gene was successfully found, more than 70 R-gene have been gradually verified by confirmatory experiment until now, with applying to Molecular Breeding, Trans-gene and the like. Besides, more and more bioinformatics researchers are dedicated to mining resistance genes, analyzing its function and biochemical mechanisms. However, some problems are still remains such as the low efficiency of current mining approach and the high false positive. In this thesis, we have analyzed the R-gene structure and exploited the machine learning approach to predict resistance gene.In our approach, we have selected the protein sequences encoded by R-gene as the research object, converting the R-gene identification problem to a Two-Class classification problem of machine learning. Firstly, we have assayed the conserved domains of resistance protein, and the effect of physical and chemical properties on the protein sequences, then a group of 188 valid features has been defined to represent the sequence. Secondly, we has utilized the under-sampling approach based on the K-Means algorithm to rebuild the training sets, aiming at solve the imbalance learning problem in R-gene classification. Finally, we have built a Random forest classifier on the new training sets to realize the R-gene classification. The specificity and sensitivity under our approach all exceed 80%, and the false positive in the R-gene identification can be notably reduced. The experimental results validate that our algorithm on R-gene classification is cogent and effective.

Related Dissertations

  1. Research on Automatic Detection Algorithm for Substructure Distress of Highway Pavement Based on SVM,U418.6
  2. ISAR Imaging Simulation of Space Targets and Target Recognition Based on ISAR Images,TN957.52
  3. Research on Feature Extraction and Classification of Pulse Waveform for Cholecystitis and Nephrotic Syndrome Diagnosis,TP391.41
  4. Application of Q-Learning in the Content-Based Image Retrieval Technology,TP391.41
  5. Research on Transductive Support Vector Machine and Its Application in Image Retrieval,TP391.41
  6. Research on Feature Extraction and Classification of Tongue Shape and Tooth-Marked Tongue in TCM Tongue Diagnosis,TP391.41
  7. Research on Visual Measurement for Spacecraft Rendezvous and Approach,TP391.41
  8. Research on the Image Real-Time Acquisition, Storage and Image Processing System,TP391.41
  9. Feature Extraction, Selection and Combination in Lipreading,TP391.41
  10. Multi-currency Notes Technology Research and Implementation,TP391.41
  11. The Research on Paper Currency Classification Method Based on Harr-Like Feature and Minimal Ball Including Samples,TP391.41
  12. Pavement Distress Recognition Based on Image,TP391.41
  13. Research on Visual Detection and Tracking of Mobile Robots,TP242.62
  14. Development of the Digitizer Based on PCI-E Bus,TP274.2
  15. Research on Fusion Algorithm of Hyper Spectral and High Spatial Resolution Remote Sensing Image,TP751
  16. Tobacco Diseases Auto-Recognition Research Based on Image Processing Technology,S435.72
  17. Study of Acoustical Signal Transmitter Which Based on Sigma-Delta Modulation,TN761
  18. Design of Receiving Processor on Radio Signal and Multi-Rate,TN851
  19. Research on Nondestructive Detection Technology for External Qualities of Papayas Based-on Vision,S667.9
  20. The Correlation Between Drug Resistance and Some Drug Resistance Genes in Sixty-one Gallibacterium Anatis Isolates,S852.61

CLC: > Biological Sciences > Botany > Plant Cell Genetics
© 2012 www.DissertationTopic.Net  Mobile