Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01c821gn62v
Title: | REINFORCEMENT LEARNING WITHOUT REWARDS: SIGNAL-FREE EXPLORATION WITH THE MAXENT AGENT |
Authors: | Van Soest, Abby |
Advisors: | Hazan, Elad |
Department: | Computer Science |
Class Year: | 2019 |
Abstract: | In order to learn, we must be able to explore. Creative, open-ended exploration of the world is central to the acquisition of general knowledge. This paper is grounded in the insight that the same is true in machine learning: an intelligent agent will have an inherent sense of curiosity and an intrinsic ability to explore its environment. As such, we seek to determine what an agent can learn to accomplish in an unknown environment without external reward signals. This can be considered a form of unsupervised reinforcement learning, for it removes the influence of reward "labels" from the learning process. Our solution, which we term the MaxEnt algorithm, is an iterative approach to entropy maximization that is based on the conditional gradient algorithm. This paper explains and experimentally evaluates this approach in two classic control tasks and five robotic locomotion tasks. In the absence of rewards, MaxEnt agents learn a variety of novel exploratory behaviors. In the future, our maximum entropy approach can be used as an exploration component of a policy gradient algorithm in the presence of rewards. |
URI: | http://arks.princeton.edu/ark:/88435/dsp01c821gn62v |
Type of Material: | Princeton University Senior Theses |
Language: | en |
Appears in Collections: | Computer Science, 1988-2020 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
VANSOEST-ABBY-THESIS.pdf | 880.25 kB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.