Idioma: A Document-Based Language-Learning Platform

Frazao, Paulo

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01f4752k784

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Li, Xiaoyan
dc.contributor.author	Frazao, Paulo
dc.date.accessioned	2020-10-01T21:26:06Z	-
dc.date.available	2020-10-01T21:26:06Z	-
dc.date.created	2020-05-02
dc.date.issued	2020-10-01	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01f4752k784	-
dc.description.abstract	Despite ever-increasing demand for language-learning tools, especially those focused on second-language acquisition, there is a concerning lack of reading comprehension systems available on the market. Modern applications, such as Memrise and Duolingo, provide excellent services for students who are beginning and practicing a foreign language; however, the majority of these systems fail to offer a learning experience that encourages the the development of reading comprehension skills while also maintaining an acceptable level of interactivity and personalization. The goal of this project is to implement a solution that addresses these concerns using modern machine learning and natural language processing techniques. Idioma is a Portuguese language-learning web-app that presents the user with articles tailored to their proficiency level and interests. The application uses a series of machine learning classifiers to label web-scraped content, and then employs a proprietary selection algorithm to offer the user content that is both engaging and appropriate given their present skills. The application grows with the user, tracking the content that they consume to dynamically refine the selection algorithm, ensuring consistent, up-to-date suggestions. Furthermore, the application boasts a variety of quality-of-life and gamification features with the intention of maximizing user entertainment and retention. Idioma is implemented using a combination of ReactJS, Flask, MongoDB, and Scrapy. The models underlying the application were trained and evaluated using scikit-learn libraries; the best-performing models achieved an accuracy of ~80% and comparable precision and recall, demonstrating a fair competency in their binary classification task. There exist a number of opportunities for future extension in this project, but it nonetheless offers a significant foundation towards the task of building a dedicated, exciting, and cutting-edge reading comprehension tool suite.
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.title	Idioma: A Document-Based Language-Learning Platform
dc.type	Princeton University Senior Theses
pu.date.classyear	2020
pu.department	Computer Science
pu.pdf.coverpage	SeniorThesisCoverPage
pu.contributor.authorid	920059057
Appears in Collections:	Computer Science, 1988-2020

Files in This Item:

File	Description	Size	Format
FRAZAO-PAULO-THESIS.pdf		2.49 MB	Adobe PDF	Request a copy

Show simple item record

Search

Browse