Estimating False Discovery Proportion under Covariance Dependence

Gu, Weijie

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp019019s2505

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Fan, Jianqing	en_US
dc.contributor.author	Gu, Weijie	en_US
dc.contributor.other	Operations Research and Financial Engineering Department	en_US
dc.date.accessioned	2012-11-15T23:54:20Z	-
dc.date.available	2012-11-15T23:54:20Z	-
dc.date.issued	2012	en_US
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp019019s2505	-
dc.description.abstract	Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any genes are associated with some traits and those tests are correlated. In finance, thousands of correlated tests are performed to see which fund managers have winning ability. When test statistics are correlated, false discovery control becomes very challenging under arbitrary dependence. In the first part of this work, we propose a new methodology based on principal factor approximation (PFA), which successfully subtracts the common dependence and weakens significantly the correlation structure, to deal with a known but arbitrary dependence structure. We derive the theoretical distribution for false discovery proportion (FDP) in large scale multiple testing when a common threshold is used and provide a consistent FDP. This result has important applications in controlling FDR and FDP. Our estimate of FDP compares favorably with Efron (2007)'s approach, as demonstrated by the simulated examples. Our approach is further illustrated by some real data applications. We also propose a factor-adjusted procedure, which is shown in simulation studies to be more powerful than the fixed threshold procedure. In the second part of the work, we further investigate the cases where the covariance matrix of the test statistics is unknown, which are more challenging and of wider applicability. In such cases, the dependence information needs to be estimated before estimating FDP, and the estimation accuracy may greatly affect the convergence result of FDP or even violate its consistency. We first develop requirements for estimates of eigenvalues and eigenvectors of the covariance matrix such that a consistent estimate of FDP can be obtained. We then provide sufficient conditions on the dependence structure for the estimate of FDP to be consistent and suggest that an approximate factor model structure might be a good candidate. We conclude by proposing the Principal Orthogonal complEment Thresholding (POET)-PFA procedure to consistently estimate FDP. The performance of our procedure is evaluated by simulation studies and real data analysis.	en_US
dc.language.iso	en	en_US
dc.publisher	Princeton, NJ : Princeton University	en_US
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the <a href=http://catalog.princeton.edu> library's main catalog </a>	en_US
dc.subject	approximate factor model	en_US
dc.subject	covariance dependence	en_US
dc.subject	false discovery proportion	en_US
dc.subject	high dimensionality	en_US
dc.subject	multiple hypothesis testing	en_US
dc.subject.classification	Statistics	en_US
dc.subject.classification	Operations research	en_US
dc.title	Estimating False Discovery Proportion under Covariance Dependence	en_US
dc.type	Academic dissertations (Ph.D.)	en_US
pu.projectgrantnumber	690-2143	en_US
Appears in Collections:	Operations Research and Financial Engineering

Files in This Item:

File	Description	Size	Format
Gu_princeton_0181D_10326.pdf		1.79 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse