Exploring multi-armed bandit decision-making strategies in an underwater vehicle testbed

Valverde Lizano, Jonathan

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01ww72bd967

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Leonard, Naomi	-
dc.contributor.author	Valverde Lizano, Jonathan	-
dc.date.accessioned	2016-07-13T14:09:36Z	-
dc.date.available	2016-07-13T14:09:36Z	-
dc.date.created	2016-04-28	-
dc.date.issued	2016-07-13	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01ww72bd967	-
dc.description.abstract	The problem of ﬁeld estimation, or ﬁnding the spatial distribution of a resource in space, has many applications in problems of robotic search. This thesis approaches the problem in the framework of a Gaussian Multi-Armed Bandit (MAB) task, a problem in which an agent must learn about an unknown environment while maximizing expected reward. The arms correspond to discretized points in space, and the smoothness of the ﬁeld is modeled as spatial correlation between the arms. Up-per Conﬁdence Limit (UCL), an algorithm developed by Reverdy et al. in 2014 for Gaussian MAB problems with correlated arms and prior knowledge, is then applied to this problem. The smoothness of the ﬁeld is measured by a parameter known as the length scale. In real world applications, the agent can only have an estimate of this length scale. This thesis explores the performance of UCL with correlation in comparison to other algorithms when the estimate of the length scale is correct. The eﬀect of overestimates and underestimates in the length scale is then explored for ﬁelds of diﬀerent smoothness. The search task is ﬁnally implemented in a testbed with a robot, giving additional metrics of performance. The simulations showed that knowledge of the spatial correlation of the arms can result in improvements in performance using UCL when compared to other algorithms that do not account for correlation. In addition, it is shown that, in the cases studied, best performance is not actually obtained for the correct length scale estimate, but for some particular overestimate. This eﬀect must be studied further, but the results suggest that an agent performing this search should use an overestimate and not an underestimate of the length scale that describes the ﬁeld, provided this overestimate is not grossly inaccurate.	en_US
dc.format.extent	109 pages	*
dc.language.iso	en_US	en_US
dc.title	Exploring multi-armed bandit decision-making strategies in an underwater vehicle testbed	en_US
dc.type	Princeton University Senior Theses	-
pu.date.classyear	2016	en_US
pu.department	Mechanical and Aerospace Engineering	en_US
pu.pdf.coverpage	SeniorThesisCoverPage	-
Appears in Collections:	Mechanical and Aerospace Engineering, 1924-2020

Files in This Item:

File	Size	Format
null	1.5 MB	Adobe PDF	Request a copy

Show simple item record

Search

Browse