Efficient Higher-Order Optimization for Machine Learning

Bullins, Brian  Anderson

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01zg64tp85c

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Hazan, Elad	-
dc.contributor.author	Bullins, Brian Anderson	-
dc.contributor.other	Computer Science Department	-
dc.date.accessioned	2020-07-13T02:19:04Z	-
dc.date.available	2020-07-13T02:19:04Z	-
dc.date.issued	2019	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01zg64tp85c	-
dc.description.abstract	In recent years, stochastic gradient descent (SGD) has taken center stage for training large-scale models in machine learning. Although some higher-order methods have improved iteration complexity in theory, the per-iteration costs render them unusable when faced with millions of parameters and training examples. In this thesis, I will present several works which enable higher-order optimization to be as scalable as first-order methods. The first method is a stochastic second-order algorithm for convex optimization, called LiSSA, which uses Hessian information to construct an unbiased Newton step in time linear in the problem dimension. To bypass the typical efficiency barriers for second-order methods, we harness the ERM structure in standard machine learning tasks. While convex problems allow for global convergence, recent state-of-the-art models, such as deep neural networks, highlight the importance of developing a better understanding of non-convex guarantees. In order to handle this challenging setting, I will present FastCubic, a Hessian-based method which provably converges to first-order critical points faster than gradient descent, while additionally converging to second-order critical points. Finally, I will establish how to leverage even higher-order derivative information by means of the FastQuartic algorithm, which achieves faster convergence for both smooth and non-smooth convex optimization problems by combining efficient tensor methods with near-optimal higher-order acceleration.	-
dc.language.iso	en	-
dc.publisher	Princeton, NJ : Princeton University	-
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>	-
dc.subject	convex optimization	-
dc.subject	higher-order	-
dc.subject	machine learning	-
dc.subject	non-convex optimization	-
dc.subject	second-order	-
dc.subject.classification	Computer science	-
dc.title	Efficient Higher-Order Optimization for Machine Learning	-
dc.type	Academic dissertations (Ph.D.)	-
Appears in Collections:	Computer Science

Files in This Item:

File	Description	Size	Format
Bullins_princeton_0181D_13113.pdf		2.59 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse