Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01b5644v56k
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Fan, Jianqing | |
dc.contributor.author | Chen, Tony | |
dc.date.accessioned | 2020-09-30T14:18:18Z | - |
dc.date.available | 2020-09-30T14:18:18Z | - |
dc.date.created | 2020-05-03 | |
dc.date.issued | 2020-09-30 | - |
dc.identifier.uri | http://arks.princeton.edu/ark:/88435/dsp01b5644v56k | - |
dc.description.abstract | Modern data has become more complicated with the rise of high-dimensional and big data, which often present statistical challenges of strong dependence and correlation structures. Many novel methods have been developed to select important variables and factors from high-dimensional data in a wide array of disciplines, such as biomedical research. In addition, clinical data often also contain missing values, which requires imputation methods that try to best approximate those missing values. Factor-adjusted regularized model selection (FarmSelect) is a promising high-dimensional variable selection method, but has not been fully studied under conditions of missing data. The analyses in this thesis will evaluate the performance of FarmSelect in the presence of missing data by comparing a variety of imputation methods to determine the most practical approach to this problem. Since many biomedical studies tend to focus on binary outcomes, these methods will be studied in the context of high-dimensional logistic regression. After sufficient simulation analysis, a few promising imputation techniques are applied to real data from neuroblastoma and lymphoma studies. These analyses test the considered methods under less ideal statistical conditions and further inform what imputation methods are most appropriate in these situations. Overall, this thesis will lay the groundwork for continued methodological research to address the statistical challenges of missingness in high dimension data analysis. | |
dc.format.mimetype | application/pdf | |
dc.language.iso | en | |
dc.title | Factor-Adjusted Regularized Model Selection for Logistic Regression in the Presence of Missing Data | |
dc.type | Princeton University Senior Theses | |
pu.date.classyear | 2020 | |
pu.department | Operations Research and Financial Engineering | |
pu.pdf.coverpage | SeniorThesisCoverPage | |
pu.contributor.authorid | 961149640 | |
pu.certificate | Center for Statistics and Machine Learning | |
Appears in Collections: | Operations Research and Financial Engineering, 2000-2020 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
CHEN-TONY-THESIS.pdf | 2.01 MB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.