Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01f4752k784
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorLi, Xiaoyan
dc.contributor.authorFrazao, Paulo
dc.date.accessioned2020-10-01T21:26:06Z-
dc.date.available2020-10-01T21:26:06Z-
dc.date.created2020-05-02
dc.date.issued2020-10-01-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp01f4752k784-
dc.description.abstractDespite ever-increasing demand for language-learning tools, especially those focused on second-language acquisition, there is a concerning lack of reading comprehension systems available on the market. Modern applications, such as Memrise and Duolingo, provide excellent services for students who are beginning and practicing a foreign language; however, the majority of these systems fail to offer a learning experience that encourages the the development of reading comprehension skills while also maintaining an acceptable level of interactivity and personalization. The goal of this project is to implement a solution that addresses these concerns using modern machine learning and natural language processing techniques. Idioma is a Portuguese language-learning web-app that presents the user with articles tailored to their proficiency level and interests. The application uses a series of machine learning classifiers to label web-scraped content, and then employs a proprietary selection algorithm to offer the user content that is both engaging and appropriate given their present skills. The application grows with the user, tracking the content that they consume to dynamically refine the selection algorithm, ensuring consistent, up-to-date suggestions. Furthermore, the application boasts a variety of quality-of-life and gamification features with the intention of maximizing user entertainment and retention. Idioma is implemented using a combination of ReactJS, Flask, MongoDB, and Scrapy. The models underlying the application were trained and evaluated using scikit-learn libraries; the best-performing models achieved an accuracy of ~80% and comparable precision and recall, demonstrating a fair competency in their binary classification task. There exist a number of opportunities for future extension in this project, but it nonetheless offers a significant foundation towards the task of building a dedicated, exciting, and cutting-edge reading comprehension tool suite.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.titleIdioma: A Document-Based Language-Learning Platform
dc.typePrinceton University Senior Theses
pu.date.classyear2020
pu.departmentComputer Science
pu.pdf.coverpageSeniorThesisCoverPage
pu.contributor.authorid920059057
Appears in Collections:Computer Science, 1988-2020

Files in This Item:
File Description SizeFormat 
FRAZAO-PAULO-THESIS.pdf2.49 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.