Dissertation > Excellent graduate degree dissertation topics show

A new generation of sequencing technology association analysis and population structure based on the theory of

Author: XiaoZuoChang
Tutor: JinLi; XiongMoZuo
School: Fudan University
Course: Bioinformatics
Keywords: Whole Genomic Sequencing Association Analysis Linkage Disequilibrium Population Structure Locally Linear Embedding Principal Component Analysis Dimensionality Reduction LASSO
CLC: Q347
Type: PhD thesis
Year: 2011
Downloads: 447
Quote: 0
Read: Download Dissertation


Next-generation sequencing technologies can effectively detect the entire spectrum of genomic variation and provide a powerful tool for systematic exploration of the universe of common, low frequency and rare variants in the entire genome. The 1000 Genomes Project (1000G) represents one of such endeavors to characterize the human genetic variation pattern at the MAF=1% level as a foundation for association studies, provides a set of data including SNP, INDELs and CNVs.However, the current paradigm for genome-wide association studies (GWAS) is to catalogue and genotype common variants (5%<MAF). The methods and study design for testing association of low frequency (0.5%< MAF≤5%) and rare variation (MAF≤0.5%) have not been thoroughly investigated. In here, we explored different strategies and study design for the near future GWAS in the post-era, based on both the 1000 Genomes low coverage pilot data and exon pilot data.We investigated the linkage disequilibrium (LD) pattern among common and low frequency SNP and its implication for association studies. We found that the LD between low frequency alleles and low frequency alleles, and low frequency alleles and common alleles are much weaker than the LD between common and common alleles. We examined various tagging designs with and without statistical imputation approaches and compare their power against de novo resequencing in mapping causal variants under various disease models. We used the low coverage pilot data which contain~14M SNP as a hypothetical genotype-array platform (Pilot 14M) to interrogate its impact on selection of tag SNP, mapping coverage and power of association tests. We found that even after imputation we still observed 45.4% of low frequency SNP which were untaggable and only 67.7% of low frequency variation was covered by Pilot 14M array. This suggests GWAS based on SNP arrays would be ill-suited for association studies of low frequency variation.The dimension of the population genetics data produced by next-generation sequencing platforms is extremely high. However, the "intrinsic dimensionality" of sequence data which determines the structure of populations is much lower. This motivates us to use locally linear embedding (LLE) which projects high dimensional genomic data into low dimensional, neighborhood preserving embedding, as a general framework for population structure and historical inference. To facilitate application of the LLE to population genetic analysis, we systematically investigate several important properties of the LLE and reveal the connection between the LLE and principal component analysis (PCA). Identifying a set of markers and genomic regions which could be used for population structure analysis will provide invaluable information for population genetics and association studies. In addition to identifying the LLE-correlated or PCA-correlated structure informative marker, we have developed a new statistic that integrates genomic information content in a genomic region for collectively studying its association with the population structure and LASSO algorithm to search such regions across the genomes. We applied the developed methodologies to a low coverage pilot dataset in the 1000 Genomes Project and a PHASEⅢMexico dataset of the HapMap. Our results demonstrated that the LLE outperforms PCA for population structure analysis. We observed that 25.1%,44.9% and 21.4% of the common variants and 89.2%,92.4% and 75.1% of the rare variants were the LLE-correlated markers in CEU, YRI and ASI, respectively. This showed that rare variants which are often private to specific populations have much higher power to identify population substructure than common variants. The preliminary results demonstrated that next generation sequencing offers a rich resources and LLE provide a powerful tool for population structure analysis.

Related Dissertations

  1. Application of Improved Principal Component Analysis Algorithm in Course Construction,G642.4
  2. Study of Data Reduction Technique Based on Manifold Learning,TP311.13
  3. Research on Predicting Intrinsic Disorder Protein Structure Based on Supervision Manifold Learning Algorithm,Q51
  4. New Methodology for Mapping Resistance Trait Loci in Crop Cultivar Population,S336
  5. Characterization of Population Structure and Linkage Disequilibrium of Chinese Soybean Landerace Population and QTL Association Analysis of Traits Related to Breeding for Soybeans,S565.1
  6. Characterization of Population Structure and Linkage Disequilibrium of Chinese Wild Soybean Population and QTL Association Analysis of Traits Related to Breeding for Soybeans,S565.1
  7. Screening Stamen Development Heat-Tolerance Germplasm and Association Analysis of Agronomic Traits with SSR Makers in Upland Cotton,S562
  8. Genetic Diversity, Genetic Variance Association Mapping of Fresh Seed Quality Traits QTLs of G.Max in China,S565.1
  9. On the Strategies of the Structural Growth in Urban Sport Population in the City of Fuyang,G812.7
  10. Association Analysis for Seed Shape Traits and 100-seed Weight in Soybean (Glycine Max L. Merr.),S565.1
  11. Association Analysis between SSR Markers and Major Agronomic Traits in Cultivated Soybean (Glycine Max L. Merr.),S565.1
  12. Association Analysis Using SSR Markers to Find QTL for Yield and Qualities in Soybean,S565.1
  13. Research of Diagnosing Cucumber Diseases Based on Hyperspectral Imaging,S436.421
  14. Developing Functional Marker, Mapping and Association Analysis of Protein Phosphatase 2A TaPP2Aa/c in Common Wheat,S512.1
  15. The Impact of Tourism on Typical Vegetation in Luya Mountain Nature Reserve, Shanxi Province,S759.9
  16. Macro Law Perspective Study Population,D922.1
  17. Macaca mulatta palm morphological study of pattern ridge count,Q954
  18. Zhaoguan Lower Coal Group water inrush prediction and control techniques,TD745
  19. A Comparative Study of Sichuan and Shaanxi , Chongqing and Manufacturing Competitiveness,F224
  20. Research on GEVA-VC Enterprise Performance Evaluation Indicators System in Jiangxi Copper Company Limited,F426.32
  21. Level of development of human capital in Chongqing,F224

CLC: > Biological Sciences > Genetics > Genetics subdiscipline > Population genetics
© 2012 www.DissertationTopic.Net  Mobile