A MACHINE LEARNING APPROACH TO PRIVACY IN
AUDIO-ENABLED IoT DEVICES

Karuri, Vincent

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01g158bk91v

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Jha, Niraj K.	-
dc.contributor.author	Karuri, Vincent	-
dc.date.accessioned	2017-07-24T13:11:03Z	-
dc.date.available	2017-07-24T13:11:03Z	-
dc.date.created	2017-05-08	-
dc.date.issued	2017-5-8	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01g158bk91v	-
dc.description.abstract	IoT (Internet of Things) devices have proliferated our everyday lives. Some estimate put the number at over 30 billion by 2020 with over 200 billion intermittent connections [1]. IoT devices are no longer limited to the absolutely tech savvy or the well-to-do populations. Rather, almost everyone today owns a smart device, be it a smartphone, camera, TV set or headphone. With these realization comes the problem of privacy in our homes, offices and leisure spots when we are surrounded by so many sensors recording and transmitting our information unbeknownst to us. While it may be difficult to solve this problem completely in the case where the devices do not belong to us, most people would like to feel in control of what their personal devices can or cannot do. In this thesis, we focus more on IoT devices that have recording capability (audioenabled). These devices can be generalized as those having microphones like smartphones, smart TVs and voice assistants e.g. Amazon Echo, Siri and Google Now. The challenge we hope to address is that recordings made by these devices may sometimes be private and inadvertently shared with the devices. The devices usually send such recorded data to their servers that store this sensitive information, again unbeknownst to the end user. This violation of privacy is a huge problem and will become even bigger as IoT devices become ubiquitous. We propose a system that can be implemented between the server and device that can solve the problem of sensitive data leakage. The system works by filtering out predefined blacklisted words from audio speech recordings before passing on the recordings to the IoT device application which would then send the information to its servers. The system takes advantage of robust audio feature extraction techniques and the use of machine learning algorithms to provide the best feature set to use in classifying words as blacklisted or whitelisted. The positively identified blacklisted words are then extracted or zeroed out from the speech signal. The system was tested on one word sentences where it gave accuracies of 89% when identifying blacklisted words and 96% when identifying whitelisted words in single-word sentences. The system also gave 87% accuracy in identifying blacklisted words and 88% accuracy in identifying whitelisted words in multi-word sentences. Such a system would be a good foundation for implementing user-controlled privacy in IoT devices	en_US
dc.language.iso	en_US	en_US
dc.title	A MACHINE LEARNING APPROACH TO PRIVACY IN AUDIO-ENABLED IoT DEVICES	en_US
dc.type	Princeton University Senior Theses	-
pu.date.classyear	2017	en_US
pu.department	Electrical Engineering	en_US
pu.pdf.coverpage	SeniorThesisCoverPage	-
pu.contributor.authorid	960888901	-
pu.contributor.advisorid	010000369	-
Appears in Collections:	Electrical Engineering, 1932-2020

Files in This Item:

File	Size	Format
senior_thesis_final_report.pdf	693.21 kB	Adobe PDF	Request a copy

Show simple item record

Search

Browse