Kernel Regression and Estimation: Learning Theory/Application and Errors-in-Variables Model Analysis

Wu, Peiyuan

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp010g354h462

Title:	Kernel Regression and Estimation: Learning Theory/Application and Errors-in-Variables Model Analysis
Authors:	Wu, Peiyuan
Advisors:	Kung, Sun Yuan
Contributors:	Electrical Engineering Department
Keywords:	cost effectiveness errors-in-variables Gauss Markov model kernel method minimum mean square error ridge regression
Subjects:	Electrical engineering
Issue Date:	2015
Publisher:	Princeton, NJ : Princeton University
Abstract:	This dissertation contains both application and theoretical topics in the field of kernel regression and estimation. The first part of this dissertation discusses kernel-based learning applications in (1) large-scale active authentication prototype and (2) incomplete data analysis. Traditional kernel-based learning algorithms encounter scalability issues in large-scale datasets. For instance, with N samples, the learning complexity is O(N^2) for support vector machine (SVM) and O(N^3) for kernel ridge regression (KRR) with the default Gaussian RBF kernel. By approximating the RBF kernel with a truncated-RBF (TRBF) kernel, a fast KRR learning algorithm is adopted with O(N) training cost and constant prediction cost. It finds application in large scale active authentication prototype based on free-text keystroke analysis, showing both performance and computational advantages over SVM with RBF kernel. This dissertation also explores the application of kernel approach to incomplete data analysis (KAIDA), where the data to be analyzed is highly incomplete due to controlled or unanticipated causes such as concerns on privacy and security as well as cost/failure/accessibility of data sensors. Two partial cosine (PC) kernels, denoted by SM-PC and DM-PC, are proposed. Simulation shows the potential of KAIDA delivering strong resilience against high data sparsity. The second part of this dissertation discusses theoretical properties of nonlinear regression problem in errors-in-variables (EiV) model. The dissertation examines the impact of input noise on nonlinear regression functions by a spectral decomposition analysis. It turns out that the minimum mean square error (MMSE) due to input noise can be decomposed as contributions from various spectral components. Both numerical and analytical methodologies are proposed to construct the orthogonal basis of interest. Closed-form expressions exist in Gaussian and uniform input models. This dissertation also extends Gauss-Markov theorem to EiV model with stochastic regression coefficients. A weighted regularized least squares estimator is proposed to minimize the mean squared error (MSE) in the estimation of both the regression coefficients and the output. Analytical closed-form expressions are derived for polynomial regression problems with Gaussian-distributed inputs. A notion of least mean squares kernel (LMSK) is also proposed to minimize the MSE in KRR learning model.
URI:	http://arks.princeton.edu/ark:/88435/dsp010g354h462
Alternate format:	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Electrical Engineering

Files in This Item:

File	Description	Size	Format
Wu_princeton_0181D_11255.pdf		9.4 MB	Adobe PDF	View/Download

Show full item record

Search

Browse