Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01b5644v56k
Title: Factor-Adjusted Regularized Model Selection for Logistic Regression in the Presence of Missing Data
Authors: Chen, Tony
Advisors: Fan, Jianqing
Department: Operations Research and Financial Engineering
Certificate Program: Center for Statistics and Machine Learning
Class Year: 2020
Abstract: Modern data has become more complicated with the rise of high-dimensional and big data, which often present statistical challenges of strong dependence and correlation structures. Many novel methods have been developed to select important variables and factors from high-dimensional data in a wide array of disciplines, such as biomedical research. In addition, clinical data often also contain missing values, which requires imputation methods that try to best approximate those missing values. Factor-adjusted regularized model selection (FarmSelect) is a promising high-dimensional variable selection method, but has not been fully studied under conditions of missing data. The analyses in this thesis will evaluate the performance of FarmSelect in the presence of missing data by comparing a variety of imputation methods to determine the most practical approach to this problem. Since many biomedical studies tend to focus on binary outcomes, these methods will be studied in the context of high-dimensional logistic regression. After sufficient simulation analysis, a few promising imputation techniques are applied to real data from neuroblastoma and lymphoma studies. These analyses test the considered methods under less ideal statistical conditions and further inform what imputation methods are most appropriate in these situations. Overall, this thesis will lay the groundwork for continued methodological research to address the statistical challenges of missingness in high dimension data analysis.
URI: http://arks.princeton.edu/ark:/88435/dsp01b5644v56k
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Operations Research and Financial Engineering, 2000-2020

Files in This Item:
File Description SizeFormat 
CHEN-TONY-THESIS.pdf2.01 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.