REINFORCEMENT LEARNING WITHOUT REWARDS: SIGNAL-FREE EXPLORATION WITH THE MAXENT AGENT

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01c821gn62v

Title:	REINFORCEMENT LEARNING WITHOUT REWARDS: SIGNAL-FREE EXPLORATION WITH THE MAXENT AGENT
Authors:	Van Soest, Abby
Advisors:	Hazan, Elad
Department:	Computer Science
Class Year:	2019
Abstract:	In order to learn, we must be able to explore. Creative, open-ended exploration of the world is central to the acquisition of general knowledge. This paper is grounded in the insight that the same is true in machine learning: an intelligent agent will have an inherent sense of curiosity and an intrinsic ability to explore its environment. As such, we seek to determine what an agent can learn to accomplish in an unknown environment without external reward signals. This can be considered a form of unsupervised reinforcement learning, for it removes the influence of reward "labels" from the learning process. Our solution, which we term the MaxEnt algorithm, is an iterative approach to entropy maximization that is based on the conditional gradient algorithm. This paper explains and experimentally evaluates this approach in two classic control tasks and five robotic locomotion tasks. In the absence of rewards, MaxEnt agents learn a variety of novel exploratory behaviors. In the future, our maximum entropy approach can be used as an exploration component of a policy gradient algorithm in the presence of rewards.
URI:	http://arks.princeton.edu/ark:/88435/dsp01c821gn62v
Type of Material:	Princeton University Senior Theses
Language:	en
Appears in Collections:	Computer Science, 1988-2020

Files in This Item:

File	Description	Size	Format
VANSOEST-ABBY-THESIS.pdf		880.25 kB	Adobe PDF	Request a copy

Search

Browse