Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01pr76f626t
Title: | The Wisdom of Crowds: A Natural Language Processing Approach to Forecasting Sports Betting Markets Using Social Media Fan Sentiment |
Authors: | Chen, Peter |
Advisors: | Carmona, Rene |
Department: | Operations Research and Financial Engineering |
Certificate Program: | Center for Statistics and Machine Learning |
Class Year: | 2019 |
Abstract: | The wisdom of crowds, or the idea that the collective knowledge of a group of people can be regarded as an alternative to expert opinion, has been repeatedly shown to be an effective indicator of sporting outcomes. With NFL betting being the largest sports betting market in the United States and fan sentiment becoming readily available and abundant with the rise of social media platforms such as Reddit, we study the predictive relationship between social media output and NFL outcomes. In particular, we focus on two most popular forms of sports betting on the per game level, wagering which team will win the point spread (WTS), a handicap for the team bookkeepers expect will win the game, and whether the combined score will be above or below the over-under line, a prediction for the total score set by bookkeepers. Popular natural language processing representations of Reddit text including bag-of-words, term frequency inverse document frequency, and out-of-the-box sentiment scoring models as a proxy for public sentiment were shown to be successful regressors in several common machine learning models. Training on games from 2012-2018 seasons, discriminative models (logistic regression and linear support vector machines) using bag-of-words and term frequency inverse document frequency representations and nearest neighbor models using sentiment scoring algorithms (Vader and Afinn) were found to be most successful at this classification task, achieving out-of-sample testing accuracies of up to 54%, well above the 52.4% required to generate a profitable betting strategy. Further attempts at implementing an LSTM neural network have also shown similar success. |
URI: | http://arks.princeton.edu/ark:/88435/dsp01pr76f626t |
Type of Material: | Princeton University Senior Theses |
Language: | en |
Appears in Collections: | Operations Research and Financial Engineering, 2000-2020 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
CHEN-PETER-THESIS.pdf | 1.67 MB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.