Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp0176537377f
Title: | Sequential Decision-Making Problems: Online Learning for Optimization over Networks |
Authors: | Zhou, Angela |
Advisors: | Powell, Warren |
Contributors: | Bou, Haitham |
Department: | Operations Research and Financial Engineering |
Class Year: | 2016 |
Abstract: | The granularity of big data allows marketers to target customers more effectively with personalized offers. We model the resulting learning and optimization problem of allocating coupons on a social network where consumer response models are unknown. Uncertainty in these model parameters leads to a natural "exploration-exploitation" tradeoff in choosing which discounts to offer, while choosing which customers to include in a marketing campaign is a stochastic subset selection problem. We adapt the Knowledge Gradient with Discrete Priors from optimal learning and find a statistically significant increase in revenue and general robustness to moderate amounts of sampling noise. We then incorporate social network information about the connectivity of users and optimally learn the underlying customer segment parameters, while accounting for network effects on revenue. Our sampling method under uncertainty is competitive with realizations of the policy which makes the optimal decision with perfect information; however, realized revenue is lower than the expected revenue. We also consider the task assignment problem in crowdsourcing settings, such as on a platform like Amazon's Mechanical Turk, where we are uncertain both of users' reliabilities and the true labels of the questions. We derive an approximation scheme for a dynamic task assignment framework which uses intermediate estimates of the question answers to estimate the increase in mutual information we receive by querying each user. This dynamic assignment framework improves upon the final label accuracy of random sampling by up to 35% for small sample regimes, achieves comparable performance with one-shot dynamic assignment, and better performance when the ratio of questions to users increases. These models and results suggest a general framework using ideas from optimal learning to learn parameters of customer response models while optimizing for revenue. Given these estimates, for submodular objective functions, greedy approximation schemes suffice to construct cardinality-constrained subsets of users to target. |
Extent: | 137 pages |
URI: | http://arks.princeton.edu/ark:/88435/dsp0176537377f |
Type of Material: | Princeton University Senior Theses |
Language: | en_US |
Appears in Collections: | Operations Research and Financial Engineering, 2000-2020 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
Zhou_Angela_final_thesis.pdf | 1.58 MB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.