Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01n009w514c
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorStewart, Brandon-
dc.contributor.advisorEngelhardt, Barbara-
dc.contributor.authorZimmer, Jacob-
dc.date.accessioned2019-09-04T17:53:12Z-
dc.date.available2019-09-04T17:53:12Z-
dc.date.created2019-05-06-
dc.date.issued2019-09-04-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp01n009w514c-
dc.description.abstractAdvances in machine learning algorithms have made the adage about the unreasonable effectiveness of data more true than ever, but as the size of data grows it becomes increasingly difficult to create a training set of sufficient size and quality to take full advantage of algorithmic gains. A variety of approaches using various levels of supervision have been proposed to bridge this divide, but the two most promising are Active Learning and Data Programming. In this work, the performance of these strategies and a hybrid approach which incorporates elements of both are compared. It is found that given access to enough data, Active Learning can outperform either Data Programming or the Hybrid approach, but in situations where labeling is expensive it can be worthwhile to use Data Programming. A case study analyzing the political content of emails from the Enron Corporation confirms this finding.en_US
dc.format.mimetypeapplication/pdf-
dc.language.isoenen_US
dc.titleMethods for Labeling Big Data: Active Learning and Data Programmingen_US
dc.typePrinceton University Senior Theses-
pu.date.classyear2019en_US
pu.departmentComputer Scienceen_US
pu.pdf.coverpageSeniorThesisCoverPage-
pu.contributor.authorid961182969-
pu.certificateCenter for Statistics and Machine Learningen_US
Appears in Collections:Computer Science, 1988-2020

Files in This Item:
File Description SizeFormat 
ZIMMER-JACOB-THESIS.pdf1.18 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.