Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp012n49t4439
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorSalganik, Matthew-
dc.contributor.authorLiu, David-
dc.date.accessioned2018-08-14T16:09:50Z-
dc.date.available2018-08-14T16:09:50Z-
dc.date.created2018-05-08-
dc.date.issued2018-08-14-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp012n49t4439-
dc.description.abstractAs the availability of social data and reliance on computational methods increases, there is a need to establish guidelines for computational reproducibility in the social sciences. The Fragile Families Challenge presented a unique case study in which interdisciplinary researchers developed social prediction models and then submitted papers for review. Based on our experience reproducing the results as part of a journal review process, we propose a set of guidelines that can improve the reproducibility of open sourced code. These findings suggest that open sourcing data and code is a crucial first step towards computational reproducibility but leaves the replicator with the task of configuring an appropriate computing environment and parsing the code structure. By leveraging virtualization and pipeline design - tools and concepts from software engineering - we develop a set of guidelines that journal editors can adopt. In the case of Fragile Families, these guidelines are shown to be simple enough for adoption yet effective in rendering code more transparent. The rewards of reproducibility are further shown by developing an extension that boosts one of the Challenge's submissions, improving the model's mean squared error.en_US
dc.format.mimetypeapplication/pdf-
dc.language.isoenen_US
dc.titleComputational Reproducibility and the Fragile Families Challenge: Lessons Learned and Suggestions for the Futureen_US
dc.typePrinceton University Senior Theses-
pu.date.classyear2018en_US
pu.departmentComputer Scienceen_US
pu.pdf.coverpageSeniorThesisCoverPage-
pu.contributor.authorid961074968-
pu.certificateCenter for Statistics and Machine Learningen_US
Appears in Collections:Computer Science, 1988-2020

Files in This Item:
File Description SizeFormat 
LIU-DAVID-THESIS.pdf858.88 kBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.