Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01nk322h04t
Title: Combinatorial Inference for Large-Scale Data Analysis
Authors: Lu, Junwei
Advisors: Liu, Han
Fan, Jianqing
Contributors: Operations Research and Financial Engineering Department
Keywords: Combinatorial inference
Hypothesis testing
Property test
Universality
Subjects: Statistics
Operations research
Artificial intelligence
Issue Date: 2018
Publisher: Princeton, NJ : Princeton University
Abstract: Problems of inferring the combinatorial structures of networks arise in many real applications ranging from genomic regulatory networks, brain networks to social networks. This poses new and challenging problems on the uncertainty assessment and replicability analysis for statistical methods inferring these network topological structures. This thesis develops {\it combinatorial inference}, a new inferential framework for networks, to conduct hypothesis tests and variable selection for combinatorial structures in large-scale graphs. In the first part of the thesis, we propose a unified inferential method to test hypotheses on the global combinatorial properties of graphical models. We showed that my method works for general monotone graph properties that can be preserved under edge deletion including bipartite, planar, k-colorable, etc. We develop a new combinatorial minimax theory to justify the optimality of the skip-down algorithm. We introduce a new notion of graph packing entropy to sharply characterize the complexity of combinatorial inference problems for graphical models. It can be viewed as the combinatorial counterpart of the famous metric entropy theory on parametric minimax lower bounds. In the second part of the thesis, we generalize the combinatorial inference for larger family of graphical models. We propose a novel class of dynamic nonparanormal graphical models, which allows us to model high dimensional heavy-tailed systems and the evolution of their latent network structures. Under this model we develop statistical tests for presence of edges both locally at a fixed index value and globally over a range of values. The tests are developed for a high-dimensional regime, are robust to model selection mistakes and do not require commonly assumed minimum signal strength. The testing procedures are based on a high dimensional, debiasing-free moment estimator, which uses a novel kernel smoothed Kendall's tau correlation matrix as an input statistic.
URI: http://arks.princeton.edu/ark:/88435/dsp01nk322h04t
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Operations Research and Financial Engineering

Files in This Item:
File Description SizeFormat 
Lu_princeton_0181D_12511.pdf6.26 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.