Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01hx11xf42z
Title: An Intensive Analysis of Clustering and Dimensionality Reduction: Applying the Expectation-Maximization Algorithm and Principal Component Analysis to Classify Handwritten Digits
Authors: Charash, Philip
Advisors: Liu, Han
Department: Operations Research and Financial Engineering
Class Year: 2014
Abstract: This thesis is concerned with a deep understanding of the clustering Expectation-Maximization (EM) Algorithm under different assumptions and given different inputs. There are three assumptions governing the values of the covariance matrix involved in the implementation of the EM algorithm under a mixture of Multivariate Gaussian models. The performance of the EM algorithm under these three assumptions is closely examined using data from the Mixed National Institute of Standards and Technology (MNIST) database. In this database are images of handwritten digits, 0-9.Each image is a member of one of these 10 groups, and the performance of the algorithm is judged based on how well it can find these unobserved labels and cluster the digits into groups. Furthermore, the dimensionality reduction technique, Principal Component Analysis, is applied at times throughout the paper, to study the effects of its lossy compression and how the dimensionality of the data influences the performance of the EM algorithm. After a rigorous study of the behavior of the EM algorithm significant results were established, including the choice of the spherical covairance matrix as the overall best performing assumption.
Extent: 100
URI: http://arks.princeton.edu/ark:/88435/dsp01hx11xf42z
Type of Material: Princeton University Senior Theses
Language: en_US
Appears in Collections:Operations Research and Financial Engineering, 2000-2020

Files in This Item:
File SizeFormat 
Charash,Philip final thesis.pdf965.48 kBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.