subscribe to arXiv mailings

Transportability of Principal Causal Effects

Authors: Justin M. Clark, Kollin W. Rott, James S. Hodges, Jared D. Huling

Abstract: Recent research in causal inference has made important progress in addressing challenges to the external validity of trial findings. Such methods weight trial participant data to more closely resemble the distribution of effect-modifying covariates in a well-defined target population. In the presence of participant non-adherence to study medication, these methods effectively transport an intention… ▽ More Recent research in causal inference has made important progress in addressing challenges to the external validity of trial findings. Such methods weight trial participant data to more closely resemble the distribution of effect-modifying covariates in a well-defined target population. In the presence of participant non-adherence to study medication, these methods effectively transport an intention-to-treat effect that averages over heterogeneous compliance behaviors. In this paper, we develop a principal stratification framework to identify causal effects conditioning on both compliance behavior and membership in the target population. We also develop non-parametric efficiency theory for and construct efficient estimators of such "transported" principal causal effects and characterize their finite-sample performance in simulation experiments. While this work focuses on treatment non-adherence, the framework is applicable to a broad class of estimands that target effects in clinically-relevant, possibly latent subsets of a target population. △ Less

Submitted 16 July, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2311.01538 [pdf]

doi 10.1214/23-AOAS1767

A reluctant additive model framework for interpretable nonlinear individualized treatment rules

Authors: Jacob M. Maronge, Jared D. Huling, Guanhua Chen

Abstract: Individualized treatment rules (ITRs) for treatment recommendation is an important topic for precision medicine as not all beneficial treatments work well for all individuals. Interpretability is a desirable property of ITRs, as it helps practitioners make sense of treatment decisions, yet there is a need for ITRs to be flexible to effectively model complex biomedical data for treatment decision m… ▽ More Individualized treatment rules (ITRs) for treatment recommendation is an important topic for precision medicine as not all beneficial treatments work well for all individuals. Interpretability is a desirable property of ITRs, as it helps practitioners make sense of treatment decisions, yet there is a need for ITRs to be flexible to effectively model complex biomedical data for treatment decision making. Many ITR approaches either focus on linear ITRs, which may perform poorly when true optimal ITRs are nonlinear, or black-box nonlinear ITRs, which may be hard to interpret and can be overly complex. This dilemma indicates a tension between interpretability and accuracy of treatment decisions. Here we propose an additive model-based nonlinear ITR learning method that balances interpretability and flexibility of the ITR. Our approach aims to strike this balance by allowing both linear and nonlinear terms of the covariates in the final ITR. Our approach is parsimonious in that the nonlinear term is included in the final ITR only when it substantially improves the ITR performance. To prevent overfitting, we combine cross-fitting and a specialized information criterion for model selection. Through extensive simulations, we show that our methods are data-adaptive to the degree of nonlinearity and can favorably balance ITR interpretability and flexibility. We further demonstrate the robust performance of our methods with an application to a cancer drug sensitive study. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Journal ref: Ann. Appl. Stat. 17 (4) 3384 - 3402, 2023

arXiv:2310.19988 [pdf, other]

Counterfactual fairness for small subgroups

Authors: Solvejg Wastvedt, Jared D Huling, Julian Wolfson

Abstract: While methods for measuring and correcting differential performance in risk prediction models have proliferated in recent years, most existing techniques can only be used to assess fairness across relatively large subgroups. The purpose of algorithmic fairness efforts is often to redress discrimination against groups that are both marginalized and small, so this sample size limitation often preven… ▽ More While methods for measuring and correcting differential performance in risk prediction models have proliferated in recent years, most existing techniques can only be used to assess fairness across relatively large subgroups. The purpose of algorithmic fairness efforts is often to redress discrimination against groups that are both marginalized and small, so this sample size limitation often prevents existing techniques from accomplishing their main aim. We take a three-pronged approach to address the problem of quantifying fairness with small subgroups. First, we propose new estimands built on the "counterfactual fairness" framework that leverage information across groups. Second, we estimate these quantities using a larger volume of data than existing techniques. Finally, we propose a novel data borrowing approach to incorporate "external data" that lacks outcomes and predictions but contains covariate and group membership information. This less stringent requirement on the external data allows for more possibilities for external data sources. We demonstrate practical application of our estimators to a risk prediction model used by a major Midwestern health system during the COVID-19 pandemic. △ Less

Submitted 26 January, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.11620 [pdf, other]

Enhancing modified treatment policy effect estimation with weighted energy distance

Authors: Ziren Jiang, Jared D. Huling

Abstract: The effects of continuous treatments are often characterized through the average dose response function, which is challenging to estimate from observational data due to confounding and positivity violations. Modified treatment policies (MTPs) are an alternative approach that aim to assess the effect of a modification to observed treatment values and work under relaxed assumptions. Estimators for M… ▽ More The effects of continuous treatments are often characterized through the average dose response function, which is challenging to estimate from observational data due to confounding and positivity violations. Modified treatment policies (MTPs) are an alternative approach that aim to assess the effect of a modification to observed treatment values and work under relaxed assumptions. Estimators for MTPs generally focus on estimating the conditional density of treatment given covariates and using it to construct weights. However, weighting using conditional density models has well-documented challenges. Further, MTPs with larger treatment modifications have stronger confounding and no tools exist to help choose an appropriate modification magnitude. This paper investigates the role of weights for MTPs showing that to control confounding, weights should balance the weighted data to an unobserved hypothetical target population, that can be characterized with observed data. Leveraging this insight, we present a versatile set of tools to enhance estimation for MTPs. We introduce a distance that measures imbalance of covariate distributions under the MTP and use it to develop new weighting methods and tools to aid in the estimation of MTPs. We illustrate our methods through an example studying the effect of mechanical power of ventilation on in-hospital mortality. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2306.17478 [pdf, other]

Leveraging Observational Data for Efficient CATE Estimation in Randomized Controlled Trials

Authors: Amir Asiaee, Chiara Di Gravio, Yuting Mei, Jared D. Huling

Abstract: Randomized controlled trials (RCTs) are the gold standard for causal inference, but they are often powered only for average effects, making estimation of heterogeneous treatment effects (HTEs) challenging. Conversely, large-scale observational studies (OS) offer a wealth of data but suffer from confounding bias. Our paper presents a novel framework to leverage OS data for enhancing the efficiency… ▽ More Randomized controlled trials (RCTs) are the gold standard for causal inference, but they are often powered only for average effects, making estimation of heterogeneous treatment effects (HTEs) challenging. Conversely, large-scale observational studies (OS) offer a wealth of data but suffer from confounding bias. Our paper presents a novel framework to leverage OS data for enhancing the efficiency in estimating conditional average treatment effects (CATEs) from RCTs while mitigating common biases. We propose an innovative approach to combine RCTs and OS data, expanding the traditionally used control arms from external sources. The framework relaxes the typical assumption of CATE invariance across populations, acknowledging the often unaccounted systematic differences between RCT and OS participants. We demonstrate this through the special case of a linear outcome model, where the CATE is sparsely different between the two populations. The core of our framework relies on learning potential outcome means from OS data and using them as a nuisance parameter in CATE estimation from RCT data. We further illustrate through experiments that using OS findings reduces the variance of the estimated CATE from RCTs and can decrease the required sample size for detecting HTEs. △ Less

Submitted 30 June, 2023; originally announced June 2023.

arXiv:2302.11098 [pdf, other]

Doubly structured sparsity for grouped multivariate responses with application to functional outcome score modeling

Authors: Jared D. Huling, Jennifer P. Lundine, Julie C. Leonard

Abstract: This work is motivated by the need to accurately model a vector of responses related to pediatric functional status using administrative health data from inpatient rehabilitation visits. The components of the responses have known and structured interrelationships. To make use of these relationships in modeling, we develop a two-pronged regularization approach to borrow information across the respo… ▽ More This work is motivated by the need to accurately model a vector of responses related to pediatric functional status using administrative health data from inpatient rehabilitation visits. The components of the responses have known and structured interrelationships. To make use of these relationships in modeling, we develop a two-pronged regularization approach to borrow information across the responses. The first component of our approach encourages joint selection of the effects of each variable across possibly overlapping groups related responses and the second component encourages shrinkage of effects towards each other for related responses. As the responses in our motivating study are not normally-distributed, our approach does not rely on an assumption of multivariate normality of the responses. We show that with an adaptive version of our penalty, our approach results in the same asymptotic distribution of estimates as if we had known in advance which variables were non-zero and which variables have the same effects across some outcomes. We demonstrate the performance of our method in extensive numerical studies and in an application in the prediction of functional status of pediatric patients using administrative health data in a population of children with neurological injury or illness at a large children's hospital. △ Less

Submitted 21 February, 2023; originally announced February 2023.

arXiv:2302.07840 [pdf]

Causally-interpretable meta-analysis: clearly-defined causal effects and two case studies

Authors: Kollin W. Rott, Gert Bronfort, Haitao Chu, Jared D. Huling, Brent Leininger, Mohammad Hassan Murad, Zhen Wang, James S. Hodges

Abstract: Meta-analysis is commonly used to combine results from multiple clinical trials, but traditional meta-analysis methods do not refer explicitly to a population of individuals to whom the results apply and it is not clear how to use their results to assess a treatment's effect for a population of interest. We describe recently-introduced causally-interpretable meta-analysis methods and apply their t… ▽ More Meta-analysis is commonly used to combine results from multiple clinical trials, but traditional meta-analysis methods do not refer explicitly to a population of individuals to whom the results apply and it is not clear how to use their results to assess a treatment's effect for a population of interest. We describe recently-introduced causally-interpretable meta-analysis methods and apply their treatment effect estimators to two individual-participant data sets. These estimators transport estimated treatment effects from studies in the meta-analysis to a specified target population using individuals' potentially effect-modifying covariates. We consider different regression and weighting methods within this approach and compare the results to traditional aggregated-data meta-analysis methods. In our applications, certain versions of the causally-interpretable methods performed somewhat better than the traditional methods, but the latter generally did well. The causally-interpretable methods offer the most promise when covariates modify treatment effects and our results suggest that traditional methods work well when there is little effect heterogeneity. The causally-interpretable approach gives meta-analysis an appealing theoretical framework by relating an estimator directly to a specific population and lays a solid foundation for future developments. △ Less

Submitted 15 February, 2023; originally announced February 2023.

Comments: 31 pages, 2 figures Submitted to Research Synthesis Methods

arXiv:2302.03544 [pdf, other]

Causally-Interpretable Random-Effects Meta-Analysis

Authors: Justin M. Clark, Kollin W. Rott, James S. Hodges, Jared D. Huling

Abstract: Recent work has made important contributions in the development of causally-interpretable meta-analysis. These methods transport treatment effects estimated in a collection of randomized trials to a target population of interest. Ideally, estimates targeted toward a specific population are more interpretable and relevant to policy-makers and clinicians. However, between-study heterogeneity not ari… ▽ More Recent work has made important contributions in the development of causally-interpretable meta-analysis. These methods transport treatment effects estimated in a collection of randomized trials to a target population of interest. Ideally, estimates targeted toward a specific population are more interpretable and relevant to policy-makers and clinicians. However, between-study heterogeneity not arising from differences in the distribution of treatment effect modifiers can raise difficulties in synthesizing estimates across trials. The existence of such heterogeneity, including variations in treatment modality, also complicates the interpretation of transported estimates as a generic effect in the target population. We propose a conceptual framework and estimation procedures that attempt to account for such heterogeneity, and develop inferential techniques that aim to capture the accompanying excess variability in causal estimates. This framework also seeks to clarify the kind of treatment effects that are amenable to the techniques of generalizability and transportability. △ Less

Submitted 7 February, 2023; originally announced February 2023.

arXiv:2212.12394 [pdf, other]

doi 10.1111/biom.13546

Sufficient Dimension Reduction for Populations with Structured Heterogeneity

Authors: Jared D. Huling, Menggang Yu

Abstract: A key challenge in building effective regression models for large and diverse populations is accounting for patient heterogeneity. An example of such heterogeneity is in health system risk modeling efforts where different combinations of comorbidities fundamentally alter the relationship between covariates and health outcomes. Accounting for heterogeneity arising combinations of factors can yield… ▽ More A key challenge in building effective regression models for large and diverse populations is accounting for patient heterogeneity. An example of such heterogeneity is in health system risk modeling efforts where different combinations of comorbidities fundamentally alter the relationship between covariates and health outcomes. Accounting for heterogeneity arising combinations of factors can yield more accurate and interpretable regression models. Yet, in the presence of high dimensional covariates, accounting for this type of heterogeneity can exacerbate estimation difficulties even with large sample sizes. To handle these issues, we propose a flexible and interpretable risk modeling approach based on semiparametric sufficient dimension reduction. The approach accounts for patient heterogeneity, borrows strength in estimation across related subpopulations to improve both estimation efficiency and interpretability, and can serve as a useful exploratory tool or as a powerful predictive model. In simulated examples, we show that our approach often improves estimation performance in the presence of heterogeneity and is quite robust to deviations from its key underlying assumptions. We demonstrate our approach in an analysis of hospital admission risk for a large health system and demonstrate its predictive power when tested on further follow-up data. △ Less

Submitted 23 December, 2022; originally announced December 2022.

arXiv:2211.15476 [pdf, other]

Meta-analysis of individualized treatment rules via sign-coherency

Authors: Jay Jojo Cheng, Jared D. Huling, Guanhua Chen

Abstract: Medical treatments tailored to a patient's baseline characteristics hold the potential of improving patient outcomes while reducing negative side effects. Learning individualized treatment rules (ITRs) often requires aggregation of multiple datasets(sites); however, current ITR methodology does not take between-site heterogeneity into account, which can hurt model generalizability when deploying b… ▽ More Medical treatments tailored to a patient's baseline characteristics hold the potential of improving patient outcomes while reducing negative side effects. Learning individualized treatment rules (ITRs) often requires aggregation of multiple datasets(sites); however, current ITR methodology does not take between-site heterogeneity into account, which can hurt model generalizability when deploying back to each site. To address this problem, we develop a method for individual-level meta-analysis of ITRs, which jointly learns site-specific ITRs while borrowing information about feature sign-coherency via a scientifically-motivated directionality principle. We also develop an adaptive procedure for model tuning, using information criteria tailored to the ITR learning problem. We study the proposed methods through numerical experiments to understand their performance under different levels of between-site heterogeneity and apply the methodology to estimate ITRs in a large multi-center database of electronic health records. This work extends several popular methodologies for estimating ITRs (A-learning, weighted learning) to the multiple-sites setting. △ Less

Submitted 28 November, 2022; originally announced November 2022.

Comments: Machine Learning for Health (ML4H), 2022

Journal ref: Proceedings of the 2nd Machine Learning for Health symposium, PMLR 193:171-198, 2022

arXiv:2206.06444 [pdf]

A method for comparing multiple imputation techniques: a case study on the U.S. National COVID Cohort Collaborative

Authors: Elena Casiraghi, Rachel Wong, Margaret Hall, Ben Coleman, Marco Notaro, Michael D. Evans, Jena S. Tronieri, Hannah Blau, Bryan Laraway, Tiffany J. Callahan, Lauren E. Chan, Carolyn T. Bramante, John B. Buse, Richard A. Moffitt, Til Sturmer, Steven G. Johnson, Yu Raymond Shao, Justin Reese, Peter N. Robinson, Alberto Paccanaro, Giorgio Valentini, Jared D. Huling, Kenneth Wilkins, :, Tell Bennet , et al. (12 additional authors not shown)

Abstract: Healthcare datasets obtained from Electronic Health Records have proven to be extremely useful to assess associations between patients' predictors and outcomes of interest. However, these datasets often suffer from missing values in a high proportion of cases and the simple removal of these cases may introduce severe bias. For these reasons, several multiple imputation algorithms have been propose… ▽ More Healthcare datasets obtained from Electronic Health Records have proven to be extremely useful to assess associations between patients' predictors and outcomes of interest. However, these datasets often suffer from missing values in a high proportion of cases and the simple removal of these cases may introduce severe bias. For these reasons, several multiple imputation algorithms have been proposed to attempt to recover the missing information. Each algorithm presents strengths and weaknesses, and there is currently no consensus on which multiple imputation algorithms works best in a given scenario. Furthermore, the selection of each algorithm parameters and data-related modelling choices are also both crucial and challenging. In this paper, we propose a novel framework to numerically evaluate strategies for handling missing data in the context of statistical analysis, with a particular focus on multiple imputation techniques. We demonstrate the feasibility of our approach on a large cohort of type-2 diabetes patients provided by the National COVID Cohort Collaborative (N3C) Enclave, where we explored the influence of various patient characteristics on outcomes related to COVID-19. Our analysis included classic multiple imputation techniques as well as simple complete-case Inverse Probability Weighted models. The experiments presented here show that our approach could effectively highlight the most valid and performant missing-data handling strategy for our case study. Moreover, our methodology allowed us to gain an understanding of the behavior of the different models and of how it changed as we modified their parameters. Our method is general and can be applied to different research fields and on datasets containing heterogeneous types. △ Less

Submitted 25 September, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

arXiv:2107.07086 [pdf, other]

doi 10.1080/01621459.2023.2213485

Independence weights for causal inference with continuous treatments

Authors: Jared D. Huling, Noah Greifer, Guanhua Chen

Abstract: Studying causal effects of continuous treatments is important for gaining a deeper understanding of many interventions, policies, or medications, yet researchers are often left with observational studies for doing so. In the observational setting, confounding is a barrier to the estimation of causal effects. Weighting approaches seek to control for confounding by reweighting samples so that confou… ▽ More Studying causal effects of continuous treatments is important for gaining a deeper understanding of many interventions, policies, or medications, yet researchers are often left with observational studies for doing so. In the observational setting, confounding is a barrier to the estimation of causal effects. Weighting approaches seek to control for confounding by reweighting samples so that confounders are comparable across different treatment values. Yet, for continuous treatments, weighting methods are highly sensitive to model misspecification. In this paper we elucidate the key property that makes weights effective in estimating causal quantities involving continuous treatments. We show that to eliminate confounding, weights should make treatment and confounders independent on the weighted scale. We develop a measure that characterizes the degree to which a set of weights induces such independence. Further, we propose a new model-free method for weight estimation by optimizing our measure. We study the theoretical properties of our measure and our weights, and prove that our weights can explicitly mitigate treatment-confounder dependence. The empirical effectiveness of our approach is demonstrated in a suite of challenging numerical experiments, where we find that our weights are quite robust and work well under a broad range of settings. △ Less

Submitted 5 April, 2023; v1 submitted 14 July, 2021; originally announced July 2021.

Journal ref: Journal of the American Statistical Association, 2023

arXiv:2105.00581 [pdf, other]

doi 10.1093/biomet/asad038

Robust Sample Weighting to Facilitate Individualized Treatment Rule Learning for a Target Population

Authors: Rui Chen, Jared D. Huling, Guanhua Chen, Menggang Yu

Abstract: Learning individualized treatment rules (ITRs) is an important topic in precision medicine. Current literature mainly focuses on deriving ITRs from a single source population. We consider the observational data setting when the source population differs from a target population of interest. Compared with causal generalization for the average treatment effect which is a scalar quantity, ITR general… ▽ More Learning individualized treatment rules (ITRs) is an important topic in precision medicine. Current literature mainly focuses on deriving ITRs from a single source population. We consider the observational data setting when the source population differs from a target population of interest. Compared with causal generalization for the average treatment effect which is a scalar quantity, ITR generalization poses new challenges due to the need to model and generalize the rules based on a prespecified class of functions which may not contain the unrestricted true optimal ITR. The aim of this paper is to develop a weighting framework to mitigate the impact of such misspecification and thus facilitate the generalizability of optimal ITRs from a source population to a target population. Our method seeks covariate balance over a non-parametric function class characterized by a reproducing kernel Hilbert space and can improve many ITR learning methods that rely on weights. We show that the proposed method encompasses importance weights and overlap weights as two extreme cases, allowing for a better bias-variance trade-off in between. Numerical examples demonstrate that the use of our weighting method can greatly improve ITR estimation for the target population compared with other weighting methods. △ Less

Submitted 14 June, 2023; v1 submitted 2 May, 2021; originally announced May 2021.

Comments: Biometrika, in press

arXiv:2004.13962 [pdf, other]

Energy Balancing of Covariate Distributions

Authors: Jared D. Huling, Simon Mak

Abstract: Bias in causal comparisons has a direct correspondence with distributional imbalance of covariates between treatment groups. Weighting strategies such as inverse propensity score weighting attempt to mitigate bias by either modeling the treatment assignment mechanism or balancing specified covariate moments. This paper introduces a new weighting method, called energy balancing, which instead aims… ▽ More Bias in causal comparisons has a direct correspondence with distributional imbalance of covariates between treatment groups. Weighting strategies such as inverse propensity score weighting attempt to mitigate bias by either modeling the treatment assignment mechanism or balancing specified covariate moments. This paper introduces a new weighting method, called energy balancing, which instead aims to balance weighted covariate distributions. By directly targeting distributional imbalance, the proposed weighting strategy can be flexibly utilized in a wide variety of causal analyses, including the estimation of average treatment effects and individualized treatment rules. Our energy balancing weights (EBW) approach has several advantages over existing weighting techniques. First, it offers a model-free and robust approach for obtaining covariate balance that does not require tuning parameters, obviating the need for modeling decisions of secondary nature to the scientific question at hand. Second, since this approach is based on a genuine measure of distributional balance, it provides a means for assessing the balance induced by a given set of weights for a given dataset. Finally, the proposed method is computationally efficient and has desirable theoretical guarantees under mild conditions. We demonstrate the effectiveness of this EBW approach in a suite of simulation experiments, and in studies on the safety of right heart catheterization and the effect of indwelling arterial catheters. △ Less

Submitted 11 March, 2022; v1 submitted 29 April, 2020; originally announced April 2020.

arXiv:1809.07905 [pdf, other]

Subgroup Identification Using the personalized Package

Authors: Jared D. Huling, Menggang Yu

Abstract: A plethora of disparate statistical methods have been proposed for subgroup identification to help tailor treatment decisions for patients. However a majority of them do not have corresponding R packages and the few that do pertain to particular statistical methods or provide little means of evaluating whether meaningful subgroups have been found. Recently, the work of Chen, Tian, Cai, and Yu (201… ▽ More A plethora of disparate statistical methods have been proposed for subgroup identification to help tailor treatment decisions for patients. However a majority of them do not have corresponding R packages and the few that do pertain to particular statistical methods or provide little means of evaluating whether meaningful subgroups have been found. Recently, the work of Chen, Tian, Cai, and Yu (2017) unified many of these subgroup identification methods into one general, consistent framework. The goal of the personalized package is to provide a corresponding unified software framework for subgroup identification analyses that provides not only estimation of subgroups, but evaluation of treatment effects within estimated subgroups. The personalized package allows for a variety of subgroup identification methods for many types of outcomes commonly encountered in medical settings. The package is built to incorporate the entire subgroup identification analysis pipeline including propensity score diagnostics, subgroup estimation, analysis of the treatment effects within subgroups, and evaluation of identified subgroups. In this framework, different methods can be accessed with little change in the analysis code. Similarly, new methods can easily be incorporated into the package. Besides familiar statistical models, the package also allows flexible machine learning tools to be leveraged in subgroup identification. Further estimation improvements can be obtained via efficiency augmentation. △ Less

Submitted 13 November, 2018; v1 submitted 20 September, 2018; originally announced September 2018.

arXiv:1806.01936 [pdf, other]

Selection and Estimation Optimality in High Dimensions with the TWIN Penalty

Authors: Xiaowu Dai, Jared D. Huling

Abstract: We introduce a novel class of variable selection penalties called TWIN, which provides sensible data-adaptive penalization. Under a linear sparsity regime and random Gaussian designs we show that penalties in the TWIN class have a high probability of selecting the correct model and furthermore result in minimax optimal estimators. The general shape of penalty functions in the TWIN class is the key… ▽ More We introduce a novel class of variable selection penalties called TWIN, which provides sensible data-adaptive penalization. Under a linear sparsity regime and random Gaussian designs we show that penalties in the TWIN class have a high probability of selecting the correct model and furthermore result in minimax optimal estimators. The general shape of penalty functions in the TWIN class is the key ingredient to its desirable properties and results in improved theoretical and empirical performance over existing penalties. In this work we introduce two examples of TWIN penalties that admit simple and efficient coordinate descent algorithms, making TWIN practical in large data settings. We demonstrate in challenging and realistic simulation settings with high correlations between active and inactive variables that TWIN has high power in variable selection while controlling the number of false discoveries, outperforming standard penalties. △ Less

Submitted 5 June, 2018; originally announced June 2018.

arXiv:1801.09661 [pdf, other]

Fast Penalized Regression and Cross Validation for Tall Data with the oem Package

Authors: Jared D. Huling, Peter Z. G. Qian

Abstract: A large body of research has focused on theory and computation for variable selection techniques for high dimensional data. There has been substantially less work in the big tall data paradigm, where the number of variables may be large, but the number of observations is much larger. The orthogonalizing expectation maximization (OEM) algorithm is one approach for computation of penalized models wh… ▽ More A large body of research has focused on theory and computation for variable selection techniques for high dimensional data. There has been substantially less work in the big tall data paradigm, where the number of variables may be large, but the number of observations is much larger. The orthogonalizing expectation maximization (OEM) algorithm is one approach for computation of penalized models which excels in the big tall data regime. The oem package is an efficient implementation of the OEM algorithm which provides a multitude of computation routines with a focus on big tall data, such as a function for out-of-memory computation, for large-scale parallel computation of penalized regression models. Furthermore, in this paper we propose a specialized implementation of the OEM algorithm for cross validation, dramatically reducing the computing time for cross validation over a naive implementation. △ Less

Submitted 29 January, 2018; originally announced January 2018.

Showing 1–17 of 17 results for author: Huling, J D