Statistical Methods to Understand the Genetic Architecture of Complex Traits

Statistical Methods to Understand the Genetic Architecture of Complex Traits
Title Statistical Methods to Understand the Genetic Architecture of Complex Traits PDF eBook
Author Farhad Hormozdiari
Publisher
Pages 239
Release 2016
Genre
ISBN

Download Statistical Methods to Understand the Genetic Architecture of Complex Traits Book in PDF, Epub and Kindle

Genome-wide association studies (GWAS) have successfully identified thousands of risk loci for complex traits. Identifying these variants requires annotating all possible variations between any two individuals, followed by detecting the variants that affect the disease status or traits. High-throughput sequencing (HTS) advancements have made it possible to sequence cohort of individuals in an efficient manner both in term of cost and time. However, HTS technologies have raised many computational challenges. I first propose an efficient method to recover dense genotype data by leveraging low sequencing and imputation techniques. Then, I introduce a novel statistical method (CNVeM) to identify Copy-number variations (CNVs) loci using HTS data. CNVeM was the first method that incorporates multi-mapped reads, which are discarded by all existing methods. Unfortunately, among all GWAS variants only a handful of them have been successfully validated to be biologically causal variants. Identifying causal variants can aid us to understand the biological mechanism of traits or diseases. However, detecting the causal variants is challenging due to linkage disequilibrium (LD) and the fact that some loci contain more than one causal variant. In my thesis, I will introduce CAVIAR (CAusal Variants Identification in Associated Regions) that is a new statistical method for fine mapping. The main advantage of CAVIAR is that we predict a set of variants for each locus that will contain all of the true causal variants with a high confidence level (e.g. 95%) even when the locus contains multiple causal variants. Next, I aim to understand the underlying mechanism of GWAS risk loci. A standard approach to uncover the mechanism of GWAS risk loci is to integrate results of GWAS and expression quantitative trait loci (eQTL) studies; we attempt to identify whether or not a significant GWAS variant also influences expression at a nearby gene in a specific tissue. However, detecting the same variant being causal in both GWAS and eQTL is challenging due to complex LD structure. I will introduce eCAVIAR (eQTL and GWAS CAusal Variants Identification in Associated Regions), a statistical method to compute the probability that the same variant is responsible for both the GWAS and eQTL signal, while accounting for complex LD structure. We integrate Glucose and Insulin-related traits meta-analysis with GTEx to detect the target genes and the most relevant tissues. Interestingly, we observe that most loci do not colocalize between GWAS and eQTL. Lastly, I propose an approach called phenotype imputation that allows one to perform GWAS on a phenotype that is difficult to collect. In our approach, we leverage the correlation structure between multiple phenotypes to impute the uncollected phenotype. I demonstrate that we can analytically calculate the statistical power of association test using imputed phenotype, which can be helpful for study design purposes

Computational Approaches to Understanding the Genetic Architecture of Complex Traits

Computational Approaches to Understanding the Genetic Architecture of Complex Traits
Title Computational Approaches to Understanding the Genetic Architecture of Complex Traits PDF eBook
Author Brielin C. Brown
Publisher
Pages 90
Release 2016
Genre
ISBN

Download Computational Approaches to Understanding the Genetic Architecture of Complex Traits Book in PDF, Epub and Kindle

Advances in DNA sequencing technology have resulted in the ability to generate genetic data at costs unimaginable even ten years ago. This has resulted in a tremendous amount of data, with large studies providing genotypes of hundreds of thousands of individuals at millions of genetic locations. This rapid increase in the scale of genetic data necessitates the development of computational methods that can analyze this data rapidly without sacrificing statistical rigor. The low cost of DNA sequencing also provides an opportunity to tailor medical care to an individuals unique genetic signature. However, this type of precision medicine is limited by our understanding of how genetic variation shapes disease. Our understanding of so- called complex diseases is particularly poor, and most identified variants explain only a tiny fraction of the variance in the disease that is expected to be due to genetics. This is further complicated by the fact that most studies of complex disease go directly from genotype to phenotype, ignoring the complex biological processes that take place in between. Herein, we discuss several advances in the field of complex trait genetics. We begin with a review of computational and statistical methods for working with genotype and phenotype data, as well as a discussion of methods for analyzing RNA-seq data in effort to bridge the gap between genotype and phenotype. We then describe our methods for 1) improving power to detect common variants associated with disease, 2) determining the extent to which different world populations share similar disease genetics and 3) identifying genes which show differential expression between the two haplotypes of a single individual. Finally, we discuss opportunities for future investigation in this field.

Statistical Methods for Integrative Analysis of Genomic Data

Statistical Methods for Integrative Analysis of Genomic Data
Title Statistical Methods for Integrative Analysis of Genomic Data PDF eBook
Author Jingsi Ming
Publisher
Pages 141
Release 2018
Genre Electronic books
ISBN

Download Statistical Methods for Integrative Analysis of Genomic Data Book in PDF, Epub and Kindle

Thousands of risk variants underlying complex phenotypes (quantitative traits and diseases) have been identified in genome-wide association studies (GWAS). However, there are still several challenges towards deepening our understanding of the genetic architectures of complex phenotypes. First, the majority of GWAS hits are in non-coding region and their biological interpretation is still unclear. Second, most complex traits are suggested to be highly polygenic, i.e., they are affected by a vast number of risk variants with individually small or moderate effects, whereas a large proportion of risk variants with small effects remain unknown. Third, accumulating evidence from GWAS suggests the pervasiveness of pleiotropy, a phenomenon that some genetic variants can be associated with multiple traits, but there is a lack of unified framework which is scalable to reveal relationship among a large number of traits and prioritize genetic variants simultaneously with functional annotations integrated. In this thesis, we propose two statistical methods to address these challenges using integrative analysis of summary statistics from GWASs and functional annotations. In the first part, we propose a latent sparse mixed model (LSMM) to integrate functional annotations with GWAS data. Not only does it increase the statistical power of identifying risk variants, but also offers more biological insights by detecting relevant functional annotations. To allow LSMM scalable to millions of variants and hundreds of functional annotations, we developed an efficient variational expectation-maximization (EM) algorithm for model parameter estimation and statistical inference. We first conducted comprehensive simulation studies to evaluate the performance of LSMM. Then we applied it to analyze 30 GWASs of complex phenotypes integrated with nine genic category annotations and 127 cell-type specific functional annotations from the Roadmap project. The results demonstrate that our method possesses more statistical power than conventional methods, and can help researchers achieve deeper understanding of genetic architecture of these complex phenotypes. In the second part, we propose a latent probit model (LPM) which combines summary statistics from multiple GWASs and functional annotations, to characterize relationship and increase statistical power to identify risk variants. LPM can also perform hypothesis testing for pleiotropy and annotations enrichment. To enable the scalability of LPM as the number of GWASs increases, we developed an efficient parameter-expanded EM (PX-EM) algorithm which can execute parallelly. We first validated the performance of LPM through comprehensive simulations, then applied it to analyze 44 GWASs with nine genic category annotations. The results demonstrate the benefits of LPM and can offer new insights of disease etiology.

Efficient Methods for Understanding the Genetic Architecture of Complex Traits

Efficient Methods for Understanding the Genetic Architecture of Complex Traits
Title Efficient Methods for Understanding the Genetic Architecture of Complex Traits PDF eBook
Author Yue N/A Wu
Publisher
Pages 0
Release 2022
Genre
ISBN

Download Efficient Methods for Understanding the Genetic Architecture of Complex Traits Book in PDF, Epub and Kindle

Understanding the genetic architecture of complex traits is a central goal of modern human genetics.Recent efforts focused on building large-scale biobanks, that collect genetic and trait data on large numbers of individuals, present exciting opportunities for understanding genetic architecture. However, these datasets also pose several statistical and computational challenges. In this dissertation, we consider a series of statistical models that allow us to infer aspects of the genetic architecture of single and multiple traits. Inference in these models is computationally challenging due to the size of the genetic data -- consisting of millions of genetic variants measured across hundreds of thousands of individuals.We propose a series of scalable computational methods that can perform efficient inference in these models and apply these methods to data from the UK Biobank to showcase their utility.

The Fundamentals of Modern Statistical Genetics

The Fundamentals of Modern Statistical Genetics
Title The Fundamentals of Modern Statistical Genetics PDF eBook
Author Nan M. Laird
Publisher Springer Science & Business Media
Pages 226
Release 2010-12-13
Genre Medical
ISBN 1441973389

Download The Fundamentals of Modern Statistical Genetics Book in PDF, Epub and Kindle

This book covers the statistical models and methods that are used to understand human genetics, following the historical and recent developments of human genetics. Starting with Mendel’s first experiments to genome-wide association studies, the book describes how genetic information can be incorporated into statistical models to discover disease genes. All commonly used approaches in statistical genetics (e.g. aggregation analysis, segregation, linkage analysis, etc), are used, but the focus of the book is modern approaches to association analysis. Numerous examples illustrate key points throughout the text, both of Mendelian and complex genetic disorders. The intended audience is statisticians, biostatisticians, epidemiologists and quantitatively- oriented geneticists and health scientists wanting to learn about statistical methods for genetic analysis, whether to better analyze genetic data, or to pursue research in methodology. A background in intermediate level statistical methods is required. The authors include few mathematical derivations, and the exercises provide problems for students with a broad range of skill levels. No background in genetics is assumed.

Statistical Methods and Analysis for Genome-wide Association Studies

Statistical Methods and Analysis for Genome-wide Association Studies
Title Statistical Methods and Analysis for Genome-wide Association Studies PDF eBook
Author Lin Li
Publisher
Pages 0
Release 2010
Genre
ISBN

Download Statistical Methods and Analysis for Genome-wide Association Studies Book in PDF, Epub and Kindle

Genome-wide association (GWA) studies utilize a large number of genetic variants, usually single nucleotide polymorphisms (SNPs), across the entire genome to identify genetic basis underlying disease susceptibility or phenotypic variation in a trait of interest. A commonly used analysis tool is single marker analysis (SMA), which tests one SNP at a time. Although it has been successful in identifying some causal loci, further enhancements are possible by considering multi-locus methods that investigate a large number of SNPs simultaneously. One difficulty of doing so is high dimensionality, i.e. the large number of SNPs, making it a challenging statistical problem. My first project addresses this problem in case-control GWA studies. Both the logistic and probit models are considered for binary traits, and three-component mixture priors are assumed to model the fact that only a few SNPs have non-negligible effects. To estimate posterior distributions, I propose three Markov chain Monte Carlo techniques. Specifically, an adaptive independence sampler is proposed for the logistic model, and data augmentation methods are developed for both logistic and probit models. Simulations suggest that they nearly always outperform SMA. The second project deals with GWA studies on quantitative traits with the confounding of population structure. A linear mixed model is used to account for cryptic relatedness between individuals in the sample. I propose an algorithm that is based on least angle regression and can efficiently select a small number of SNPs that are likely to be associated with the trait. Simulations show that the proposed algorithm tends to yield higher ranks for causal loci than least angle regression directly applied, and that both outperform SMA. My third project is part of the so-called CanMap project. More than 1,000 domestic dogs from different breeds, wild canids and village dogs were genotyped on a dense SNP array, and my responsibility was to carry out a GWA analysis for the domestic dog on body weight and other morphological traits including height, shapes, etc. The GWA results enrich our understanding of the impact of strong directional selection on the genetic architecture of complex traits known to be under selection.

Understanding the Genetic Architecture of Complex Traits Through Meta-analysis

Understanding the Genetic Architecture of Complex Traits Through Meta-analysis
Title Understanding the Genetic Architecture of Complex Traits Through Meta-analysis PDF eBook
Author Kodi Taraszka
Publisher
Pages 0
Release 2022
Genre
ISBN

Download Understanding the Genetic Architecture of Complex Traits Through Meta-analysis Book in PDF, Epub and Kindle

Exploring how genetic architecture shapes complex traits and diseases is a central premise of human genetics. Over the years, genome-wide association studies (GWAS) have enabled the discovery of numerous genetic variants associated with a variety of complex traits. In addition to the large array of traits analyzed, GWAS in diverse ancestral populations have also seen a significant increase in sample sizes. These efforts led to tens of thousands of publicly available GWAS summary statistics whose known correlation structure could be leveraged for further discovery. In this dissertation, I present two novel methods for the meta-analysis of GWAS summary statistics as well as conduct a pan-cancer meta-analysis of somatic variant burden. For one method, I present a likelihood ratio test for the joint analysis of genetically correlated traits and provide a per trait interpretation framework of the omnibus association. For the other method, I present a Bayesian framework that improves fine mapping of significant associations for one trait by leveraging the complementary information from distinct ancestral backgrounds. In addition to these methods, I analyzed how clinical and polygenic germline features influence somatic variant burden within and across cancer types.