Dissertation > Excellent graduate degree dissertation topics show
RNA Pseudoknots Prediction Algorithm Based on Stack
Author: LiHengWu
Tutor: ZhuDaMing
School: Shandong University
Course: Computer Software and Theory
Keywords: RNA Structure Prediction Pseudoknots Computational Biology Approximation Algorithm Approximation Scheme Dynamic Programming
CLC: TP301.6
Type: PhD thesis
Year: 2008
Downloads: 96
Quote: 2
Read: Download Dissertation
Abstract
RNA structure prediction is one of the basic issues in computational biology. RNA secondary structure prediction is the first step to predict RNA tertiary structure from RNA sequence.Comparative sequence analysis method was adopted in early period,which gets secondary structure by comparing RNA sequences of the same biological function in different organisms.Homologous sequences are hard to get for many RNA molecules,and a lot of cost of manpower is needed,so the predicted efficiency of comparative sequence analysis method is low.Minimum free energy method is adopted widely to predict RNA secondary structure now.Minimum free energy method searches the structure with minimum free energy from all conformations formed by the given sequence based on thermal dynamic model.MFOLD algorithm,presented by Zuker,has become the most widely used optimal methods for RNA secondary structure prediction,which has the predicted accuracy of about 73%.But MFOLD algorithm can not predict pseudoknots and more complex tertiary structure.Pseudoknot is the most extensive RNA tertiary structural element,very complicated and stable RNA structure.Pseudoknots prediction is the key and hot point to the research of RNA structure prediction now.Pseudoknots have regulative, catalytic and structural important function and diversity roles in different RNA molecules.It has been proved to be NPhard class that predicting RNA secondary structure containing arbitrary pseudokonts.More discussions on RNA structure prediction containing pseudoknots are centered on minimum free energy algorithms to predict simple pseudokonts. PKNOTS algorithm,presented by Rivas and Eddy,takes O(n~6)time and O(n~4)space to predict arbitrary planar pseudoknots and partial nonplanar pseudoknots;PknotsRG algorithm,presented by Jens and Robert,takes O(n~4)time and O(n~2)space to predict simple nested pseudoknots;LP algorithm,presented by Lyngsφand Pedersen,takes O (n~5)time and O(n~3)space to predict one planar pseudoknot.Combinatorial optimization algorithms fold RNA pseudokonts through simplifing energy model to consider partial main acition,but it sacrifices predicted accuracy,such as maximum weighted matching algorithm presented by Cary and Storm.There are some computer attempts to predict pseudoknots with heuristic approaches include genetic algorithms,simulated annealing algorithms and neural network algorithms.Adjacent base pairs form stack,stacking and base pairing actions in RNA molecules are the most primary and stable actions.Maximum stacking problem has also attracted close attention in RNA secondary structure prediction containing pseudoknots.Ieong presented the problem of maximum stacked base pairing number in 2001,designed its approximate algorithm with the approximate performance ratio of 3,and Lyngsφgave its exact algorithm.Moreover Lyngsφpresented the problem of maximum stacking number,proved this problem belongs to NPhard class and designed its polynomial time approximate scheme.Lyngsφtreat all stackes as the same,but RNA structural experimental results indicate that the different types of stackes have the different energy,and the energy of stack is determined by the type of its base pairs.So we present maximum weighted stacking problem based on biological stacking and base pairing actions,and discuss its solving algorithm and complexity.We divide stackes into(AA,BB)and(AB,BA)classes,compute the optimal structure of(AB,BA)class by the property of(AB,BA)class constructing clique, calculate the approximate structure of(AA,BB)class by the property of stack partion, get the approximate structure by selecting the one with the maximum weight from the weight of(AA,BB)and(AB,BA)classes,and give polynomial time approximation algorithm to predict arbitrary pseudoknots with O(nlogn)time and O(n)space.The approximate performance ratio of this algorithm is 3.We divide the given sequence into subsequences with the length less than K+ 1(2≤K),calculate and store the paired weight and unpaired number of various subsequences,delete impossible structure and the structure with smaller weight by daynamic programming,seach the optimal structure formed by subsequences with the length less than K+1 as the approximate structure of given sequence,and give a polynomial time approximate scheme to predict arbitrary pseudoknots.Now it is hard to fold large RNA molecules for existing polynomial time pseudoknots prediction algorithms.As continuous stackes construct stem and crossed stems construct pseudoknots,search the optimal structures based on combination with stem zones has become new method to RNA structure prediction,such as stem zone stacking algorithm presented by Benedeti and stem zone random stacking algorithm presented by Li WJ,predict nested secondary structures.Recently Ruan give a heuristic algorithm based on stem zone to predict secondary structure containing pseudoknots with O(n~4)time and O(n~2)space.We design a heuristic algorithm to maximize stem and predict arbitrary pseudoknots.First we search all stems of given sequence,select the largest stem with the minimal energy,and mark the bases in the selected stem to make them don’t take part in back pairing.Then we select the largest stem from the remainder bases,and follow on until no stem.It takes O(n~3)time and O(n)space to predict RNA sequences with 5000 bases.Our algorithm replace base pair with stern,and replace delete with mark,which remove lots of redundancy base pair and nonsense stack,and increase predicted accuracy.Compared with maximum weighted matching algorithm with O(n~3)time and O(n~2)space,our algorithm reduce space complexity from O(n~2)to O(n);and the experimental results show that its sensitivity is improved form 80%to 87.8%,and specificity is increased from 53.7%to 75.9%.Compared with genetic algorithm with the accuracy of 83.3%and simulated annealing algorithm with the accuracy of 79.7%,our algorithm increases the predicted accuracy to 87.5%.The key to RNA structure prediction with minimal free energy method is modeling RNA structure.Based on the characteristic of relative stablisity of RNA stem zone,we introduce semiextensible structures and kstems,compute semiextensible structures with kstems,calculate nested and crossed pseudoknots with the crossing of semiextensible structures,and establish a new model to express pseudoknots.Planar pseudoknot is the most extensive subclass of pseudoknots.In Pseudobase, there is only one nonplanar pseudoknot,and others are all planar pseudoknots.Based on new model,we introduce coaxial stacks actions,design and implement dynamic programming algorithm to predict arbitrary planar pseudoknots and simple nonplanar pseudoknots.All 254 pseudoknots in the PseudoBase are computed with this algorithm,the predicted structures of 173 sequences contain correct pseudoknots,the forecast accuracy is 70.6%;the average sensitivity is 83.6%and the average specificity is 76.6%.For 84 sequences of 3UTR other virus,the forecast accuracy is 84.5%,the average sensitivity is 91.0%and the average specificity is 91.2%. Experimental results show that this algorithm has good accuracy.Compared with the best RE algorithm,the length of the given sequence computed by our algorithm is improved from 140 bp to 800 bp,the time complexity is reduced from O(n~6)to O(n~4)and the space complexity is droped from O(n~4)to O(n~3); for PKNOTS tested sequences,the sensitivity is increased from 82.8%to 84.8%and the specificity is raised from 78.9%to 80.7%.Compared with PknotsRG algorithm with O(n~4)time and O(n~2)space to compute simple nested pseudoknots,our algorithm can compute more complex nested and crossed pseudoknots with the same time complexity as PknotsRG algorithm.The test results for PseudoBase data base indidcate that PknotsRG algorithm has the predicted accuracy of 68%and our algorithm has the one of 70.6%for pseudoknots. Compared with LP algorithm to compute one planar pseudoknot with O(n~5)time and O(n~3)space,our algorithm can compute more complex multipseudoknots,and reduce the time complexity from O(n~5)to O(n~4).There are five parts of contents and results in this paper.The basic concepts, research significance and status are summarized in the first chapter.Existing RNA structure prediction algorithms for nested structure and that for pseudoknots are reviewed and summarized in the secondary chapter.Polynomial approximation algorithm and approximation scheme for maximum weighted stacking problem are designed in the third chapter.Heuristic algorithm to maximize stem and its experimental results are given in the fourth chapter.Dynamic programming algorithm and its experimental results are given to predict planar pseudoknots in the fifth chapter. The end is a brief summary.There are three innovation points in this paper as follows.(1)Based on stacking and base pairing actions,we present maximum weighted stacking problem and design its polynomial time approximation algorithm with O(nlogn)time and O(n)space and its polynomial time approximation scheme. The approximate performance ratio of this approximation algorithm is 3.(2)We design a heuristic algorithm to maximize stem with O(n~3)time and O(n) space to predict RNA sequences with 5000 bases.Compared with maximum weighted matching algorithm with O(n~3)time and O(n~2)space,our algorithm reduce space complexity from O(n~2)to O(n),and the experimental results show that its sensitivity is improved form 80%to 87.8%,and specificity is increased from 53.7%to 75.9%.(3)We establish a new model to express pseudoknots including semiextensible structures and kstems,design a dynamic programming algorithm to predict arbitrary planar pseudoknots and simple nonplanar pseudoknots.The predicted sensitivity is 83.6%and the predicted specificity is 76.6%for PseudoBase data base.Compared with the best RE algorithm to predict arbitrary planar pseudoknots and partial nonplanar pseudoknots,the length of the given sequence computed by our algorithm is improved from 140 bp to 800 bp,the time complexity is reduced from O(n~6)to O(n~4)and the space complexity is droped from O(n~4)to O(n~3);for PKNOTS tested sequences,the sensitivity is increased from 82.8%to 84.8%and the specificity is raised from 78.9%to 80.7%.

Related Dissertations
 Research on Diagnosis Methods of Breast Masses Based on Reference Images,TP391.41
 Study on Power System Voltage and Reactive Power Control Method,TM761.1
 Research on Approaches of the Subjective Automated Assessment,TP391.1
 Multiobjective Optimal Operation for Reservoirs,TV697.1
 Based on conditional random RNA secondary structure prediction algorithm,R346
 Real estate based on dynamic programming optimization decision multiproject development,F293.3
 Study on Realtime Rhythmic Information Retrieval of Musical Audio Signal and the System Implementation,TN912.3
 Research and Implementation of Human Resources Dispatch in Software Enterprises,TP311.52
 Research and Implementation of Applications Oriented System with DAG Data Dependence,TP311.1
 The Single Machine Parallelbatching Scheduling Problem with Familyjobs,O223
 Study on the Cost Management of Transmission Line Construction Project,F426.61
 Research of Integrated Maintenance Scheduling System for Passenger Dedicated Line,U2939
 Rand Paul · M · Church philosophy neuroscience research,B842.1
 Research on Income Distribution of Virtual Enterprise,F270.7
 The Optimal Investment Strategy for DefinedContribution Pension Plans,F842.6;F272
 Biological multiple sequence alignment study algorithm,Q73
 Research on Emergency Resources Transportation Scheduling Based on Dynamic Programming,O221.3
 Storage Model Study of Mold Stsandard Parts and Acessories,F426.6;F273.4
 3M Automobile New Products Diffusion Model and Inventory Optimization,F426.471
 Study on Optimum of Cutoff Grade in Underground Mine Based on Dynamic Programming Method,TD862.1
 Transmission Network Expansion Planning Based on Ecology Evolutionary Algorithm of Food Chain,TM715
CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > General issues > Theories, methods > Algorithm Theory
© 2012 www.DissertationTopic.Net Mobile
