Latent Structure in Linear Prediction and Corpora Comparison

Latent Structure in Linear Prediction and Corpora Comparison
Title Latent Structure in Linear Prediction and Corpora Comparison PDF eBook
Author Seth Colin-Bear Strimas-Mackey
Publisher
Pages 0
Release 2022
Genre
ISBN

Download Latent Structure in Linear Prediction and Corpora Comparison Book in PDF, Epub and Kindle

This work first studies the finite-sample properties of the risk of the minimum-norm interpolating predictor in high-dimensional regression models. If the effective rank of the covariance matrix of the p regression features is much larger than the sample size n, we show that the min-norm interpolating predictor is not desirable, as its risk approaches the risk of trivially predicting the response by 0. However, our detailed finite-sample analysis reveals, surprisingly, that this behavior is not present when the regression response and the features are jointly low-dimensional, following a widely used factor regression model. Within this popular model class, and when the effective rank of the covariance matrix is smaller than n, while still allowing for p ” n, both the bias and the variance terms of the excess risk can be controlled, and the risk of the minimum-norm interpolating predictor approaches optimal benchmarks. Moreover, through a detailed analysis of the bias term, we exhibit model classes under which our upper bound on the excess risk approaches zero, while the corresponding upper bound in the recent work [arXiv:1906.11300] diverges. Furthermore, we show that the minimum-norm interpolating predictor analyzed under the factor regression model, despite being model-agnostic and devoid of tuning parameters, can have similar risk to predictors based on principal components regression and ridge regression, and can improve over LASSO based predictors, in the high-dimensional regime. The second part of this work extends the analysis of the minimum-norm interpolating predictor to a larger class of linear predictors of a real-valued response Y. Our primary contribution is in establishing finite sample risk bounds for prediction with the ubiquitous Principal Component Regression (PCR) method, under the factor regression model, with the number of principal components adaptively selected from the data--a form of theoretical guarantee that is surprisingly lacking from the PCR literature. To accomplish this, we prove a master theorem that establishes a risk bound for a large class of predictors, including the PCR predictor as a special case. This approach has the benefit of providing a unified framework for the analysis of a wide range of linear prediction methods, under the factor regression setting. In particular, we use our main theorem to recover the risk bounds for the minimum-norm interpolating predictor, and a prediction method tailored to a subclass of factor regression models with identifiable parameters. This model-tailored method can be interpreted as prediction via clusters with latent centers. To address the problem of selecting among a set of candidate predictors, we analyze a simple model selection procedure based on data-splitting, providing an oracle inequality under the factor model to prove that the performance of the selected predictor is close to the optimal candidate. In the third part of this work, we shift from the latent factor model to developing methodology in the context of topic models, which also rely on latent structure. We provide a new, principled, construction of a distance between two ensembles of independent, but not identically distributed, discrete samples, when each ensemble follows a topic model. Our proposal is a hierarchical Wasserstein distance, that can be used for the comparison of corpora of documents, or any other data sets following topic models. We define the distance by representing a corpus as a discrete measure theta over a set of clusters corresponding to topics. To a cluster we associate its center, which is itself a discrete measure over topics. This allows for summarizing both the relative weight of each topic in the corpus (represented by the components of theta) and the topic heterogeneity within the corpus in a single probabilistic representation. The distance between two corpora then follows naturally as a hierarchical Wasserstein distance between the probabilistic representations of the two corpora. We demonstrate that this distance captures differences in the content of the topics between two corpora and their relative coverage. We provide computationally tractable estimates of the distance, as well as accompanying finite sample error bounds relative to their population counterparts. We demonstrate the usage of the distance with an application to the comparison of news sources.

Partial Least Squares Structural Equation Modeling (PLS-SEM) Using R

Partial Least Squares Structural Equation Modeling (PLS-SEM) Using R
Title Partial Least Squares Structural Equation Modeling (PLS-SEM) Using R PDF eBook
Author Joseph F. Hair Jr.
Publisher Springer Nature
Pages 208
Release 2021-11-03
Genre Business & Economics
ISBN 3030805190

Download Partial Least Squares Structural Equation Modeling (PLS-SEM) Using R Book in PDF, Epub and Kindle

Partial least squares structural equation modeling (PLS-SEM) has become a standard approach for analyzing complex inter-relationships between observed and latent variables. Researchers appreciate the many advantages of PLS-SEM such as the possibility to estimate very complex models and the method’s flexibility in terms of data requirements and measurement specification. This practical open access guide provides a step-by-step treatment of the major choices in analyzing PLS path models using R, a free software environment for statistical computing, which runs on Windows, macOS, and UNIX computer platforms. Adopting the R software’s SEMinR package, which brings a friendly syntax to creating and estimating structural equation models, each chapter offers a concise overview of relevant topics and metrics, followed by an in-depth description of a case study. Simple instructions give readers the “how-tos” of using SEMinR to obtain solutions and document their results. Rules of thumb in every chapter provide guidance on best practices in the application and interpretation of PLS-SEM.

Linguistic Structure Prediction

Linguistic Structure Prediction
Title Linguistic Structure Prediction PDF eBook
Author Noah A. Smith
Publisher Springer Nature
Pages 248
Release 2022-05-31
Genre Computers
ISBN 3031021436

Download Linguistic Structure Prediction Book in PDF, Epub and Kindle

A major part of natural language processing now depends on the use of text data to build linguistic analyzers. We consider statistical, computational approaches to modeling linguistic structure. We seek to unify across many approaches and many kinds of linguistic structures. Assuming a basic understanding of natural language processing and/or machine learning, we seek to bridge the gap between the two fields. Approaches to decoding (i.e., carrying out linguistic structure prediction) and supervised and unsupervised learning of models that predict discrete structures as outputs are the focus. We also survey natural language processing problems to which these methods are being applied, and we address related topics in probabilistic inference, optimization, and experimental methodology. Table of Contents: Representations and Linguistic Data / Decoding: Making Predictions / Learning Structure from Annotated Data / Learning Structure from Incomplete Data / Beyond Decoding: Inference

Psychometrics

Psychometrics
Title Psychometrics PDF eBook
Author C.R. Rao
Publisher Elsevier
Pages 1191
Release 2007
Genre Mathematics
ISBN 0444521038

Download Psychometrics Book in PDF, Epub and Kindle

This volume, representing a compilation of authoritative reviews on a multitude of uses of statistics in epidemiology and medical statistics written by internationally renowned experts, is addressed to statisticians working in biomedical and epidemiological fields who use statistical and quantitative methods in their work. While the use of statistics in these fields has a long and rich history, explosive growth of science in general and clinical and epidemiological sciences in particular have gone through a see of change, spawning the development of new methods and innovative adaptations of standard methods. Since the literature is highly scattered, the Editors have undertaken this humble exercise to document a representative collection of topics of broad interest to diverse users. The volume spans a cross section of standard topics oriented toward users in the current evolving field, as well as special topics in much need which have more recent origins. This volume was prepared especially keeping the applied statisticians in mind, emphasizing applications-oriented methods and techniques, including references to appropriate software when relevant. The contributors are internationally renowned experts in their respective areas. This volume addresses emerging statistical challenges in epidemiological, biomedical, and pharmaceutical research. It features: methods for assessing Biomarkers, analysis of competing risks; clinical trials including sequential and group sequential, crossover designs, cluster randomized, and adaptive designs; and, structural equations modelling and longitudinal data analysis.

Handbook of Latent Variable and Related Models

Handbook of Latent Variable and Related Models
Title Handbook of Latent Variable and Related Models PDF eBook
Author
Publisher Elsevier
Pages 458
Release 2011-08-11
Genre Mathematics
ISBN 0080471269

Download Handbook of Latent Variable and Related Models Book in PDF, Epub and Kindle

This Handbook covers latent variable models, which are a flexible class of models for modeling multivariate data to explore relationships among observed and latent variables. - Covers a wide class of important models - Models and statistical methods described provide tools for analyzing a wide spectrum of complicated data - Includes illustrative examples with real data sets from business, education, medicine, public health and sociology. - Demonstrates the use of a wide variety of statistical, computational, and mathematical techniques.

Sociological Abstracts

Sociological Abstracts
Title Sociological Abstracts PDF eBook
Author Leo P. Chall
Publisher
Pages 728
Release 1993
Genre Sociology
ISBN

Download Sociological Abstracts Book in PDF, Epub and Kindle

Multiple Regression and Beyond

Multiple Regression and Beyond
Title Multiple Regression and Beyond PDF eBook
Author Timothy Z. Keith
Publisher Routledge
Pages 640
Release 2019-01-14
Genre Education
ISBN 1351667939

Download Multiple Regression and Beyond Book in PDF, Epub and Kindle

Companion Website materials: https://tzkeith.com/ Multiple Regression and Beyond offers a conceptually-oriented introduction to multiple regression (MR) analysis and structural equation modeling (SEM), along with analyses that flow naturally from those methods. By focusing on the concepts and purposes of MR and related methods, rather than the derivation and calculation of formulae, this book introduces material to students more clearly, and in a less threatening way. In addition to illuminating content necessary for coursework, the accessibility of this approach means students are more likely to be able to conduct research using MR or SEM--and more likely to use the methods wisely. This book: • Covers both MR and SEM, while explaining their relevance to one another • Includes path analysis, confirmatory factor analysis, and latent growth modeling • Makes extensive use of real-world research examples in the chapters and in the end-of-chapter exercises • Extensive use of figures and tables providing examples and illustrating key concepts and techniques New to this edition: • New chapter on mediation, moderation, and common cause • New chapter on the analysis of interactions with latent variables and multilevel SEM • Expanded coverage of advanced SEM techniques in chapters 18 through 22 • International case studies and examples • Updated instructor and student online resources