Latent Variable Models: Spectral Methods and Non-convex Optimization

Wang, Kaizheng

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01rb68xf782

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Fan, Jianqing	-
dc.contributor.author	Wang, Kaizheng	-
dc.contributor.other	Operations Research and Financial Engineering Department	-
dc.date.accessioned	2020-07-13T03:32:32Z	-
dc.date.available	2020-07-13T03:32:32Z	-
dc.date.issued	2020	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01rb68xf782	-
dc.description.abstract	Latent variable models lay the statistical foundation for data science problems with unstructured, incomplete and heterogeneous information. The significant challenges in computation and memory call for efficient estimation procedures which faithfully output high-quality solutions without much fine-tuning. Comprehensive understanding of them helps build consolidated tool kits for complex problems in the future. This thesis is devoted to developments of principled methods that provably tackle latent variable models of different forms. We first establish theoretical guarantees for several spectral methods that recover latent structures using eigen-decomposition of certain data matrices. For ranking from pairwise comparisons, we investigate a vanilla spectral estimator through $\ell_{\infty}$ perturbation analysis of eigenvectors. This justifies its statistical optimality for identifying the top $K$ items. Next, we develop a general $\ell_p$ theory for PCA in Hilbert spaces with weak signals, obtaining the optimality of spectral clustering in sub-Gaussian mixture models. Based on that, we propose a convenient spectral algorithm for contextual community detection, where one seeks to recover communities in a network given additional node attributes. It is shown to achieve the information threshold for exact recovery. Beyond spectral methods, we develop an optimization-based framework for simultaneous dimension reduction and clustering that seeks to transform the data into a low-dimensional point cloud with well-separated clusters. It is very flexible and handles data that incapacitate many existing procedures. We name the method as Clustering via Uncoupled REgression, or CURE for short. For a linear version of the method under a mixture model, we prove that a perturbed gradient descent algorithm achieves near-optimal statistical precision within reasonable amount of time, even in the absence of good initialization.	-
dc.language.iso	en	-
dc.publisher	Princeton, NJ : Princeton University	-
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>	-
dc.subject	clustering	-
dc.subject	dimension reduction	-
dc.subject	latent variable models	-
dc.subject	network analysis	-
dc.subject	non-convex optimization	-
dc.subject	spectral methods	-
dc.subject.classification	Statistics	-
dc.subject.classification	Operations research	-
dc.title	Latent Variable Models: Spectral Methods and Non-convex Optimization	-
dc.type	Academic dissertations (Ph.D.)	-
Appears in Collections:	Operations Research and Financial Engineering

Files in This Item:

File	Description	Size	Format
Wang_princeton_0181D_13328.pdf		1.1 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse