Penalized Regression Methods for Interaction and Mixed-effects Models with Applications to Genomic and Brain Imaging Data

Penalized Regression Methods for Interaction and Mixed-effects Models with Applications to Genomic and Brain Imaging Data
Title Penalized Regression Methods for Interaction and Mixed-effects Models with Applications to Genomic and Brain Imaging Data PDF eBook
Author Sahir Bhatnagar
Publisher
Pages
Release 2019
Genre
ISBN

Download Penalized Regression Methods for Interaction and Mixed-effects Models with Applications to Genomic and Brain Imaging Data Book in PDF, Epub and Kindle

"In high-dimensional (HD) data, where the number of covariates (??) greatly exceeds the number of observations (??), estimation can benefit from the bet-on-sparsity principle, i.e., only a small number of predictors are relevant in the response. This assumption can lead to more interpretable models, improved predictive accuracy, and algorithms that are computationally efficient. In genomic and brain imaging studies, where the sample sizes are particularly small due to high data collection costs, we must often assume a sparse model because there isn't enough information to estimate ?? parameters. For these reasons, penalized regression methods such as the lasso and group-lasso have generated substantial interest since they can set model coefficients exactly to zero. In the penalized regression framework, many approaches have been developed for main effects. However, there is a need for developing interaction and mixed-effects models. Indeed, accurate capture of interactions may hold the potential to better understand biological phenomena and improve prediction accuracy since they may reflect important modulation of a biological system by an external factor. Furthermore, penalized mixed-effects models that account for correlations due to groupings of observations can improve sensitivity and specificity. This thesis is composed primarily of three manuscripts. In the first manuscript, we propose a method called sail for detecting non-linear interactions that automatically enforces the strong heredity property using both the l1 and l2 penalty functions. We describe a blockwise coordinate descent procedure for solving the objective function and provide performance metrics on both simulated and real data. The second manuscript develops a general penalized mixed effects model framework to account for correlations in genetic data due to relatedness called ggmix. Our method can accommodate several sparsity-inducing penalties such as the lasso, elastic net and group lasso and also readily handles prior annotation information in the form of weights. Our algorithm has theoretical guarantees of convergence and we again assess its performance in both simulated and real data. The third manuscript describes a novel strategy called eclust for dimension reduction that leverages the effects of an exposure variable with broad impact on HD measures. With eclust, we found improved prediction and variable selection performance compared to methods that do not consider the exposure in the clustering step, or to methods that use the original data as features. We further illustrate this modeling framework through the analysis of three data sets from very different fields, each with HD data, a binary exposure, and a phenotype of interest. We provide efficient implementations of all our algorithms in freely available and open source software." --

Modeling Biological Processes in Genome-wide Association Studies Using Regularized Regression

Modeling Biological Processes in Genome-wide Association Studies Using Regularized Regression
Title Modeling Biological Processes in Genome-wide Association Studies Using Regularized Regression PDF eBook
Author Gabriel Hoffman
Publisher
Pages 336
Release 2013
Genre
ISBN

Download Modeling Biological Processes in Genome-wide Association Studies Using Regularized Regression Book in PDF, Epub and Kindle

Genome-wide association studies (GWAS) have become a a widely adopted approach to identify genetic variation that produces variation in complex phenotype. Standard statistical methods are able to identify strong associations in these datasets, but more sophisticated statistical methods that model complex aspects of the biological data can identify weaker associations and further elucidate the underlying molecular biology. We develop and apply statistical methods that explicitly model two aspects of GWAS data using two complementary forms of regularized regression. First, we model the polygenic architecture of complex phenotypes using feature selection methods in a penalized regression framework. We propose novel algorithmic, computational and heuristic approaches in order to produce a method that scales to high dimensional GWAS data and increases power to detect weak associations that are not detectable by standard tests. Second, we model the covariance between individuals due to kinship and population structure using a linear mixed model that regularizes the statistical contribution of a metric of ancestry. Linear mixed models have been widely adopted for analysis of GWAS data, but their theoretical properties have not been examined in this context. We formalize the statistical properties of the linear mixed model, develop a novel interpretation in relation to population genetics, and propose a novel low rank linear mixed model that learns the dimensionality of the correction for kinship and population structure from the data. Finally, we combine these two complementary regularized regression models into a penalized linear mixed model. We develop a unified model incorporating a novel algorithm with novel approaches to tuning nonconvex penalties and determining the optimal stopping point in the regularization path. Leveraging recent work on assessing significance of selected features, we produce a well-principled and scalable statistical method applicable to feature selection, hypothesis testing and prediction in many contexts.

Variable Selection Via Penalized Regression and the Genetic Algorithm Using Information Complexity, with Applications for High-dimensional -omics Data

Variable Selection Via Penalized Regression and the Genetic Algorithm Using Information Complexity, with Applications for High-dimensional -omics Data
Title Variable Selection Via Penalized Regression and the Genetic Algorithm Using Information Complexity, with Applications for High-dimensional -omics Data PDF eBook
Author Tyler J. Massaro
Publisher
Pages 360
Release 2016
Genre Algorithms
ISBN

Download Variable Selection Via Penalized Regression and the Genetic Algorithm Using Information Complexity, with Applications for High-dimensional -omics Data Book in PDF, Epub and Kindle

This dissertation is a collection of examples, algorithms, and techniques for researchers interested in selecting influential variables from statistical regression models. Chapters 1, 2, and 3 provide background information that will be used throughout the remaining chapters, on topics including but not limited to information complexity, model selection, covariance estimation, stepwise variable selection, penalized regression, and especially the genetic algorithm (GA) approach to variable subsetting. In chapter 4, we fully develop the framework for performing GA subset selection in logistic regression models. We present advantages of this approach against stepwise and elastic net regularized regression in selecting variables from a classical set of ICU data. We further compare these results to an entirely new procedure for variable selection developed explicitly for this dissertation, called the post hoc adjustment of measured effects (PHAME). In chapter 5, we reproduce many of the same results from chapter 4 for the first time in a multinomial logistic regression setting. The utility and convenience of the PHAME procedure is demonstrated on a set of cancer genomic data. Chapter 6 marks a departure from supervised learning problems as we shift our focus to unsupervised problems involving mixture distributions of count data from epidemiologic fields. We start off by reintroducing Minimum Hellinger Distance estimation alongside model selection techniques as a worthy alternative to the EM algorithm for generating mixtures of Poisson distributions. We also create for the first time a GA that derives mixtures of negative binomial distributions. The work from chapter 6 is incorporated into chapters 7 and 8, where we conclude the dissertation with a novel analysis of mixtures of count data regression models. We provide algorithms based on single and multi-target genetic algorithms which solve the mixture of penalized count data regression models problem, and we demonstrate the usefulness of this technique on HIV count data that were used in a previous study published by Gray, Massaro et al. (2015) as well as on time-to-event data taken from the cancer genomic data sets from earlier.

Using Mixed Effects Models to Integrate High-dimensional, Genomic Data and an Array-based Analysis of the Evolution of Brain Aging

Using Mixed Effects Models to Integrate High-dimensional, Genomic Data and an Array-based Analysis of the Evolution of Brain Aging
Title Using Mixed Effects Models to Integrate High-dimensional, Genomic Data and an Array-based Analysis of the Evolution of Brain Aging PDF eBook
Author Patrick Michael Loerch
Publisher
Pages 268
Release 2008
Genre
ISBN

Download Using Mixed Effects Models to Integrate High-dimensional, Genomic Data and an Array-based Analysis of the Evolution of Brain Aging Book in PDF, Epub and Kindle

This dissertation presents a novel contribution to scientific knowledge in the form of the development of statistical methodology for integrating diverse, genomic data sets, and by contributing to the understanding of the conservation of post- reproductive brain aging in mammals. Broadly speaking, this work is divided into two complimentary sections. The first section describes the development of a mixed effects modeling approach for integrating high-dimensional, genomic data sets. Microarray-based technologies allow researchers to monitor gene expression, transcription factor binding and alternative splicing on a genome-wide scale. The current challenge is to develop methods that analyze and integrate these vast data sets in a manner that produces biologically meaningful results, which can then be experimentally followed-up in the lab. The approach is described in terms of analyzing array data in the context of biologically-related groups of genes. A key component to this approach is the development of a novel model selection strategy that utilizes the biological information contained in the Gene Ontology graph in order to balance between biological specificity and model parsimony.

A Bayesian Group Sparse Multi-task Regression Model for Imaging Genomics

A Bayesian Group Sparse Multi-task Regression Model for Imaging Genomics
Title A Bayesian Group Sparse Multi-task Regression Model for Imaging Genomics PDF eBook
Author Keelin Greenlaw
Publisher
Pages
Release 2015
Genre
ISBN

Download A Bayesian Group Sparse Multi-task Regression Model for Imaging Genomics Book in PDF, Epub and Kindle

Recent advances in technology for brain imaging and high-throughput genotyping have motivated studies examining the influence of genetic variation on brain structure. In this setting, high-dimensional regression for multi-SNP association analysis is challenging as the brain imaging phenotypes are multivariate and there is a desire to incorporate a biological group structure among SNPs based on their belonging genes. Wang et al. (Bioinformatics, 2012) have recently developed an approach for simultaneous estimation and SNP selection based on penalized regression with regularization based on a novel group l_{2,1}-norm penalty, which encourages sparsity at the gene level. A problem with the proposed approach is that it only provides a point estimate. We solve this problem by developing a corresponding Bayesian formulation based on a three-level hierarchical model that allows for full posterior inference using Gibbs sampling. For the selection of tuning parameters, we consider techniques based on: (i) a fully Bayes approach with hyperpriors, (ii) empirical Bayes with implementation based on a Monte Carlo EM algorithm, and (iii) cross-validation (CV).

Penalized Regression Methods with Application to Generalized Linear Models, Generalized Additive Models, and Smoothing

Penalized Regression Methods with Application to Generalized Linear Models, Generalized Additive Models, and Smoothing
Title Penalized Regression Methods with Application to Generalized Linear Models, Generalized Additive Models, and Smoothing PDF eBook
Author Sri Utami Zuliana
Publisher
Pages
Release 2017
Genre
ISBN

Download Penalized Regression Methods with Application to Generalized Linear Models, Generalized Additive Models, and Smoothing Book in PDF, Epub and Kindle

LISREL Approaches to Interaction Effects in Multiple Regression

LISREL Approaches to Interaction Effects in Multiple Regression
Title LISREL Approaches to Interaction Effects in Multiple Regression PDF eBook
Author James Jaccard
Publisher SAGE Publications, Incorporated
Pages 118
Release 1996-03-21
Genre Mathematics
ISBN

Download LISREL Approaches to Interaction Effects in Multiple Regression Book in PDF, Epub and Kindle

With detailed examples, this book demonstrates the use of the computer program LISREL and how it can be applied to the analysis of interactions in regression frameworks. The authors consider a wide range of applications including: qualitative moderator variables; longitudinal designs; and product term analysis. They describe different types of measurement error and then present a discussion of latent variable representations of measurement error which serves as the foundation for the analyses described in later chapters. Finally they offer a brief introduction to LISREL and show how it can be used to execute the analyses. Readers can use this book without any prior training in LISREL and will find it an excellent introduction to analytic methods that deal with the problem of measurement error in the analysis of interactions.