Factor-Adjusted Regularized Model Selection for Logistic Regression in the Presence of Missing Data

Chen, Tony

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01b5644v56k

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Fan, Jianqing
dc.contributor.author	Chen, Tony
dc.date.accessioned	2020-09-30T14:18:18Z	-
dc.date.available	2020-09-30T14:18:18Z	-
dc.date.created	2020-05-03
dc.date.issued	2020-09-30	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01b5644v56k	-
dc.description.abstract	Modern data has become more complicated with the rise of high-dimensional and big data, which often present statistical challenges of strong dependence and correlation structures. Many novel methods have been developed to select important variables and factors from high-dimensional data in a wide array of disciplines, such as biomedical research. In addition, clinical data often also contain missing values, which requires imputation methods that try to best approximate those missing values. Factor-adjusted regularized model selection (FarmSelect) is a promising high-dimensional variable selection method, but has not been fully studied under conditions of missing data. The analyses in this thesis will evaluate the performance of FarmSelect in the presence of missing data by comparing a variety of imputation methods to determine the most practical approach to this problem. Since many biomedical studies tend to focus on binary outcomes, these methods will be studied in the context of high-dimensional logistic regression. After sufficient simulation analysis, a few promising imputation techniques are applied to real data from neuroblastoma and lymphoma studies. These analyses test the considered methods under less ideal statistical conditions and further inform what imputation methods are most appropriate in these situations. Overall, this thesis will lay the groundwork for continued methodological research to address the statistical challenges of missingness in high dimension data analysis.
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.title	Factor-Adjusted Regularized Model Selection for Logistic Regression in the Presence of Missing Data
dc.type	Princeton University Senior Theses
pu.date.classyear	2020
pu.department	Operations Research and Financial Engineering
pu.pdf.coverpage	SeniorThesisCoverPage
pu.contributor.authorid	961149640
pu.certificate	Center for Statistics and Machine Learning
Appears in Collections:	Operations Research and Financial Engineering, 2000-2020

Files in This Item:

File	Description	Size	Format
CHEN-TONY-THESIS.pdf		2.01 MB	Adobe PDF	Request a copy

Show simple item record

Search

Browse