Penalized Variable Selection for Gene-environment Interactions

Penalized Variable Selection for Gene-environment Interactions
Title Penalized Variable Selection for Gene-environment Interactions PDF eBook
Author Yinhao Du
Publisher
Pages 0
Release 2021
Genre
ISBN

Download Penalized Variable Selection for Gene-environment Interactions Book in PDF, Epub and Kindle

Gene-environment (GxE) interaction is critical for understanding the genetic basis of complex disease beyond genetic and environment main effects. In addition to existing tools for interaction studies, penalized variable selection emerges as a promising alternative for dissecting GxE interactions. Despite the success, variable selection is limited in the following aspects. First, multidimensional measurements have not been taken into fully account in interaction studies. Published variable selection methods cannot accommodate structured sparsity in the framework of integrating multiomics data for disease outcomes. Second, in the big data context, no variable selection method has been developed so far to conduct tailored interaction analysis. Third, the solution to case control association GxE studies with high dimensional genomics variants in the big data context has not been made available so far. In this dissertation, we tackle these challenges rising from GxE interaction studies in the modern era through the following projects. In the first project, we have developed a novel variable selection method to integrate multi-omics measurements in GxE interaction studies. Extensive studies have already revealed that analyzing omics data across multi-platforms is not only sensible biologically but also resulting in improved identification and prediction performance. Our integrative model can efficiently pinpoint important regulators of gene expressions through sparse dimensionality reduction and link the disease outcomes to multiple effects in the integrative GxE studies via accommodating a sparse bi-level structure. Simulation studies show the integrative model leads to better identification of GxE interactions and regulators than that of the alternative methods. In two GxE lung cancer studies with high dimensional multi-omics data, the integrative model leads to improved prediction and findings with important biological implications. In the second project, we propose to conduct interaction studies in the big data context by adopting the divide-and-conquer strategy. In particular, the sparse group variable selection for important GxE effects has been developed within the framework of alternating direction method of multiplier (ADMM). To accommodate the large-scale data in terms of either samples or features, we have developed two novel parallel ADMM based variable selection methods across samples and features, respectively. The corresponding parallel algorithms can be efficiently implemented in distributed computing platforms. Simulation studies demonstrate that the parallel ADMM based penalization methods significantly improve the computational speed for analyzing large scale data from GxE interaction studies with satisfactory identification and prediction performance. In the third project, we extend the proposed parallel ADMM based variable selection for GxE interactions in the case-control association study of type 2 diabetes. Within the parallel computation framework, we have developed a penalized logistic regression model accommodating the bi-level selection tailored for the case control GxE interaction study. The advantage of the proposed parallel penalization method has been fully illustrated in the distributed learning scenario. Simulation studies show the proposed method dramatically reduces the computational time while maintaining a competitive performance compared to the non-parallel counterparts. In the case study of type 2 diabetes with environmental factors and high dimensional SNP measurements, the proposed parallel penalization method leads to the identification of biologically important interaction effects.

High-dimensional Variable Selection in Longitudinal and Nonlinear Gene-environment Interaction Studies

High-dimensional Variable Selection in Longitudinal and Nonlinear Gene-environment Interaction Studies
Title High-dimensional Variable Selection in Longitudinal and Nonlinear Gene-environment Interaction Studies PDF eBook
Author Fei Zhou
Publisher
Pages
Release 2021
Genre
ISBN

Download High-dimensional Variable Selection in Longitudinal and Nonlinear Gene-environment Interaction Studies Book in PDF, Epub and Kindle

Variable selection from both the frequentist and Bayesian frameworks has gained increasing popularity in the analysis of high-dimensional genomic data. Despite the success of existing studies, challenges still remain as tailored methods for sparse interaction structures are not available when the response variables are repeatedly measured and/or have heavy-tailed distributions. These challenges have motivated the development of novel variable selection methods proposed in the following projects. Meanwhile, powerful software packages from these projects are publically available to facilitate fast and reliable computation, as well as reproducible research. In the first project, we have developed a novel penalized variable selection method to identify important lipid-environment interactions in a longitudinal lipidomics study, where the environment factors refer to a group of dummy variables corresponding to a four-level treatment factor. An efficient Newton-Raphson based algorithm was proposed within the generalized estimating equation (GEE) framework. Simulation studies have demonstrated the superior performance of our method over alternatives, in terms of both identification accuracy and prediction performance. Analysis of the high-dimensional lipid datasets collected using mice from the skin cancer prevention study identified meaningful markers that provide fresh insight into the underlying mechanism of cancer preventive effects. In the second project, we have proposed a sparse group penalization method for the bi-level GxE interaction study under the repeatedly measured phenotype to accommodate more general environment factors. Within the quadratic inference function (QIF) framework, the proposed method can achieve simultaneous identification of main and interaction effects on both the group and individual level. We conducted simulation studies to establish the advantage of the proposed regularization methods. In the case study, the environment factors include age, gender and treatment, which are either continuous or categorical. Our method leads to improved prediction and identification of main and interaction effects with important implications. In the third project, a sparse Bayesian quantile varying coefficient model has been developed for non-linear GxE studies. The proposed model can accommodate heavy-tailed errors and outliers from the disease phenotypes while pinpointing important non-linear interactions through Bayesian variable selection based on spike-and-slab priors. Fast computation has been facilitated by the efficient Gibbs sampler. Simulation studies and real data analysis with age as the univariate environment factor have been performed to show the superiority of the proposed method over multiple competing alternatives. The open source R packages with C++ implementations of all the methods under comparison have been provided along this dissertation. The R packages interep and springer, for the first two projects respectively, are available on CRAN. The R package for the last project on Bayesian regularized quantile varying coefficient model will be released soon to the public.

Variable Selection for Data Aggregated from Different Sources with Group of Variable Structure

Variable Selection for Data Aggregated from Different Sources with Group of Variable Structure
Title Variable Selection for Data Aggregated from Different Sources with Group of Variable Structure PDF eBook
Author Camilo Broc
Publisher
Pages 0
Release 2019
Genre
ISBN

Download Variable Selection for Data Aggregated from Different Sources with Group of Variable Structure Book in PDF, Epub and Kindle

During the last decades, the amount of available genetic data on populations has growndrastically. From one side, a refinement of chemical technologies have made possible theextraction of the human genome of individuals at an accessible cost. From the other side,consortia of institutions and laboratories around the world have permitted the collectionof data on a variety of individuals and population. This amount of data raised hope onour ability to understand the deepest mechanisms involved in the functioning of our cells.Notably, genetic epidemiology is a field that studies the relation between the geneticfeatures and the onset of a disease. Specific statistical methods have been necessary forthose analyses, especially due to the dimensions of available data: in genetics, informationis contained in a high number of variables compared to the number of observations.In this dissertation, two contributions are presented. The first project called PIGE (Pathway-Interaction Gene Environment) deals with gene-environment interaction assessments.The second one aims at developing variable selection methods for data which has groupstructures in both the variables and the observations.The document is divided into six chapters. The first chapter sets the background of this work,where both biological and mathematical notations and concepts are presented and gives ahistory of the motivation behind genetics and genetic epidemiology. The second chapterpresent an overview of the statistical methods currently in use for genetic epidemiology.The third chapter deals with the identification of gene-environment interactions. It includesa presentation of existing approaches for this problem and a contribution of the thesis. Thefourth chapter brings off the problem of meta-analysis. A definition of the problem and anoverview of the existing approaches are presented. Then, a new approach is introduced.The fifth chapter explains the pleiotropy studies and how the method presented in theprevious chapter is suited for this kind of analysis. The last chapter compiles conclusionsand research lines for the future.

Sparse Simultaneous Penalized Variable Selection in Hierarchical Structure Genetic Data Analysis

Sparse Simultaneous Penalized Variable Selection in Hierarchical Structure Genetic Data Analysis
Title Sparse Simultaneous Penalized Variable Selection in Hierarchical Structure Genetic Data Analysis PDF eBook
Author Shuang Huang
Publisher
Pages 0
Release 2017
Genre
ISBN

Download Sparse Simultaneous Penalized Variable Selection in Hierarchical Structure Genetic Data Analysis Book in PDF, Epub and Kindle

The sparse simultaneous penalized variable selection method for data with hierarchical structure is proposed to identify the quantitative trait loci and expression traits that are related to certain clinical trait in genetic data analysis. This method is developed for data sets in which the dependency is linear, and among a large number of gene loci and expression traits candidates, relatively few are important to the interested clinical trait. The method focuses on identifying the candidates in genome set and expression traits that are significantly related to clinical observation via the hierarchical dependence structure. A penalized linear model is used to reduce the number of parameters, using a novel computational algorithm that can handle the unknowns simultaneously. A data-adaptive tuning procedure based on cross validation acts as a parameter selector. Simulation studies are conducted to check the performance of the proposed method, and to compare with some well developed methods, including several penalized methods and Step AIC method. The real data application is done on a data set from an obesity study. The data set contains 541 mice, and for each individual, over 1,000 expression traits and around 1,000 gene loci are recorded. We compare the finding of our method with previous studies on the same species of mice and the similarity and difference of the outcomes are discussed.

Variable Selection in Varying Multi-Index Coefficient Models with Applications to Gene-Environmental Interactions

Variable Selection in Varying Multi-Index Coefficient Models with Applications to Gene-Environmental Interactions
Title Variable Selection in Varying Multi-Index Coefficient Models with Applications to Gene-Environmental Interactions PDF eBook
Author Shunjie Guan
Publisher
Pages 119
Release 2017
Genre Electronic dissertations
ISBN 9780355086553

Download Variable Selection in Varying Multi-Index Coefficient Models with Applications to Gene-Environmental Interactions Book in PDF, Epub and Kindle

Insights in Statistical Genetics and Methodology: 2022

Insights in Statistical Genetics and Methodology: 2022
Title Insights in Statistical Genetics and Methodology: 2022 PDF eBook
Author Simon Charles Heath
Publisher Frontiers Media SA
Pages 172
Release 2023-10-24
Genre Science
ISBN 283253645X

Download Insights in Statistical Genetics and Methodology: 2022 Book in PDF, Epub and Kindle

This Research Topic is part of the Insights in Frontiers in Genetics series.

Epistasis

Epistasis
Title Epistasis PDF eBook
Author Ka-Chun Wong
Publisher
Pages 402
Release 2021
Genre Electronic books
ISBN 9781071609477

Download Epistasis Book in PDF, Epub and Kindle