Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01m039k7603
Title: | Computing on Large, Sparse Datasets and Error-Prone Fabrics |
Authors: | Golnari, Pareesa Ameneh |
Advisors: | Malik, Sharad |
Contributors: | Electrical Engineering Department |
Keywords: | crs error-tolerant computing reliability soft error sparse formats spmm |
Subjects: | Electrical engineering Computer engineering |
Issue Date: | 2018 |
Publisher: | Princeton, NJ : Princeton University |
Abstract: | In this dissertation we study problems arising from two trends: computation on large and sparse datasets and computing on error-prone fabrics. Every year the dataset sizes are growing. However, many of these large datasets are sparse, i.e., the majority of the data is zero. Therefore, skipping the zero elements can considerably accelerate computation on these datasets. We focus on accelerating a common kernel for sparse computation, sparse matrix-matrix multiplication (SpMM), and propose a high-performance and scalable systolic accelerator that minimizes the bandwidth-to-memory requirement and accelerates this operation 9-30 times compared to state-of-the-art. We also study sparse formats used to store sparse datasets. These formats help with reducing the required bandwidth and storage by storing only the non-zero elements. We modify the popular sparse format: CRS and propose the InCRS format that improves non-regular accesses. We show that this modification reduces the required memory accesses and consequently accelerates SpMM 5-12 times. As transistor scaling continues, devices are getting more unreliable and result in errors in the systems built out of them. We provide a framework that allows for comparing the error tolerance of different sparse data formats and choosing the most appropriate format for an arbitrary application. As case studies, we compare the performance of different formats for two machine learning applications, RBM and PCA, and a set of linear algebra operations. We also study error-tolerant processors built on error-prone fabrics that allow for errors in the architectural states. We formalize the minimal requirements for these processors to assure that they potentially provide useful results are progress, preventing the error effects to accumulate over time, and executing the essential parts of the program. We propose a framework to model the control flow of these processors, capturing the effects of errors and protection mechanisms, and to verify the reliability properties on them. As case studies, we verify these properties on two recent error-tolerant processors, PPU and ERSA, and propose modifications to these designs to satisfy the minimal reliability requirements. |
URI: | http://arks.princeton.edu/ark:/88435/dsp01m039k7603 |
Alternate format: | The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu |
Type of Material: | Academic dissertations (Ph.D.) |
Language: | en |
Appears in Collections: | Electrical Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Golnari_princeton_0181D_12464.pdf | 6.67 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.