Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01hx11xf42z
Title: | An Intensive Analysis of Clustering and Dimensionality Reduction: Applying the Expectation-Maximization Algorithm and Principal Component Analysis to Classify Handwritten Digits |
Authors: | Charash, Philip |
Advisors: | Liu, Han |
Department: | Operations Research and Financial Engineering |
Class Year: | 2014 |
Abstract: | This thesis is concerned with a deep understanding of the clustering Expectation-Maximization (EM) Algorithm under different assumptions and given different inputs. There are three assumptions governing the values of the covariance matrix involved in the implementation of the EM algorithm under a mixture of Multivariate Gaussian models. The performance of the EM algorithm under these three assumptions is closely examined using data from the Mixed National Institute of Standards and Technology (MNIST) database. In this database are images of handwritten digits, 0-9.Each image is a member of one of these 10 groups, and the performance of the algorithm is judged based on how well it can find these unobserved labels and cluster the digits into groups. Furthermore, the dimensionality reduction technique, Principal Component Analysis, is applied at times throughout the paper, to study the effects of its lossy compression and how the dimensionality of the data influences the performance of the EM algorithm. After a rigorous study of the behavior of the EM algorithm significant results were established, including the choice of the spherical covairance matrix as the overall best performing assumption. |
Extent: | 100 |
URI: | http://arks.princeton.edu/ark:/88435/dsp01hx11xf42z |
Type of Material: | Princeton University Senior Theses |
Language: | en_US |
Appears in Collections: | Operations Research and Financial Engineering, 2000-2020 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
Charash,Philip final thesis.pdf | 965.48 kB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.