Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp019019s2505
Title: Estimating False Discovery Proportion under Covariance Dependence
Authors: Gu, Weijie
Advisors: Fan, Jianqing
Contributors: Operations Research and Financial Engineering Department
Keywords: approximate factor model
covariance dependence
false discovery proportion
high dimensionality
multiple hypothesis testing
Subjects: Statistics
Operations research
Issue Date: 2012
Publisher: Princeton, NJ : Princeton University
Abstract: Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any genes are associated with some traits and those tests are correlated. In finance, thousands of correlated tests are performed to see which fund managers have winning ability. When test statistics are correlated, false discovery control becomes very challenging under arbitrary dependence. In the first part of this work, we propose a new methodology based on principal factor approximation (PFA), which successfully subtracts the common dependence and weakens significantly the correlation structure, to deal with a known but arbitrary dependence structure. We derive the theoretical distribution for false discovery proportion (FDP) in large scale multiple testing when a common threshold is used and provide a consistent FDP. This result has important applications in controlling FDR and FDP. Our estimate of FDP compares favorably with Efron (2007)'s approach, as demonstrated by the simulated examples. Our approach is further illustrated by some real data applications. We also propose a factor-adjusted procedure, which is shown in simulation studies to be more powerful than the fixed threshold procedure. In the second part of the work, we further investigate the cases where the covariance matrix of the test statistics is unknown, which are more challenging and of wider applicability. In such cases, the dependence information needs to be estimated before estimating FDP, and the estimation accuracy may greatly affect the convergence result of FDP or even violate its consistency. We first develop requirements for estimates of eigenvalues and eigenvectors of the covariance matrix such that a consistent estimate of FDP can be obtained. We then provide sufficient conditions on the dependence structure for the estimate of FDP to be consistent and suggest that an approximate factor model structure might be a good candidate. We conclude by proposing the Principal Orthogonal complEment Thresholding (POET)-PFA procedure to consistently estimate FDP. The performance of our procedure is evaluated by simulation studies and real data analysis.
URI: http://arks.princeton.edu/ark:/88435/dsp019019s2505
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Operations Research and Financial Engineering

Files in This Item:
File Description SizeFormat 
Gu_princeton_0181D_10326.pdf1.79 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.