Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01b5644v56k
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorFan, Jianqing
dc.contributor.authorChen, Tony
dc.date.accessioned2020-09-30T14:18:18Z-
dc.date.available2020-09-30T14:18:18Z-
dc.date.created2020-05-03
dc.date.issued2020-09-30-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp01b5644v56k-
dc.description.abstractModern data has become more complicated with the rise of high-dimensional and big data, which often present statistical challenges of strong dependence and correlation structures. Many novel methods have been developed to select important variables and factors from high-dimensional data in a wide array of disciplines, such as biomedical research. In addition, clinical data often also contain missing values, which requires imputation methods that try to best approximate those missing values. Factor-adjusted regularized model selection (FarmSelect) is a promising high-dimensional variable selection method, but has not been fully studied under conditions of missing data. The analyses in this thesis will evaluate the performance of FarmSelect in the presence of missing data by comparing a variety of imputation methods to determine the most practical approach to this problem. Since many biomedical studies tend to focus on binary outcomes, these methods will be studied in the context of high-dimensional logistic regression. After sufficient simulation analysis, a few promising imputation techniques are applied to real data from neuroblastoma and lymphoma studies. These analyses test the considered methods under less ideal statistical conditions and further inform what imputation methods are most appropriate in these situations. Overall, this thesis will lay the groundwork for continued methodological research to address the statistical challenges of missingness in high dimension data analysis.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.titleFactor-Adjusted Regularized Model Selection for Logistic Regression in the Presence of Missing Data
dc.typePrinceton University Senior Theses
pu.date.classyear2020
pu.departmentOperations Research and Financial Engineering
pu.pdf.coverpageSeniorThesisCoverPage
pu.contributor.authorid961149640
pu.certificateCenter for Statistics and Machine Learning
Appears in Collections:Operations Research and Financial Engineering, 2000-2020

Files in This Item:
File Description SizeFormat 
CHEN-TONY-THESIS.pdf2.01 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.