Heterogeneous Monolithic 3D and FinFET Architectures for Energy-efficient Computing

Yu, Ye

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01r781wj908

Title:	Heterogeneous Monolithic 3D and FinFET Architectures for Energy-efficient Computing
Authors:	Yu, Ye
Advisors:	Jha, Niraj K.
Contributors:	Electrical Engineering Department
Keywords:	Deep learning FinFET Heterogeneous architecture Monolithic 3D integration Neural network
Subjects:	Computer engineering Electrical engineering
Issue Date:	2019
Publisher:	Princeton, NJ : Princeton University
Abstract:	More transistors are integrated within the same footprint area as the technology node shrinks to deliver higher performance. However, this is accompanied by higher power density that usually exceeds the coping capability of inexpensive cooling techniques. This Power Wall prevents the chip from running at full speed with all the devices powered-on. Another major bottleneck in chip design is the imbalance between the processor clock rate and memory access speed. This Memory Wall keeps the processor from fully utilizing its compute power. To address both the Power and Memory Walls, we propose several approaches and architectures. To tackle the Memory Wall, we develop an efficient memory interface for monolithic 3D-stacked non-volatile RAMs (NVRAMs). It takes advantage of the tremendous bandwidth made available by monolithic inter-tier vias (MIVs) to implement an on-chip memory bus in order to hide the latency of large data transfers. To tackle the Power Wall, we add a fine-grain dynamically reconfigurable (FDR) field- programmable gate array (FPGA) in our monolithic 3D architecture. It uses the concept of temporal logic folding to localize on-chip communication. We show that the architecture reduces both power and energy significantly at a better performance for both memory- and compute-intensive applications. The second problem targeted in this work is to develop energy-efficient architectures for convolutional neural networks (CNNs). CNNs have been shown to outperform conventional machine-learning algorithms across a wide range of applications, e.g., object detection, image classification, image segmentation, etc. However, the high computational complexity of CNNs often necessitates extremely fast and efficient hardware. The problem is getting worse as the size of neural networks grows exponentially. As a result, customized hardware accelerators have been developed to accelerate CNN processing without sacrificing model accuracy. However, previous accelerator design studies have not fully considered the characteristics of the target applications, which may lead to sub-optimal architecture designs. On the other hand, new CNN models have been developed for better accuracy, but their compatibility with the underlying hardware accelerator is overlooked most of the time. We propose an application-driven framework for architectural design space exploration of CNN accelerators. This framework is based on a hardware analytical model for individual CNN operations. It models the accelerator design task as a multi-dimensional optimization problem. We demonstrate that it can be efficaciously used in application-driven accelerator architecture design. In addition, it is capable of improving neural network models to best fit the underlying hardware resources. Most existing CNN accelerators focus on exploring various dataflow styles and computational parallelism designs. However, potential performance improvement from the sparsity (in activations and weights) is still underdeveloped. The amount of computation and memory footprint of CNNs can be significantly reduced if sparsity is exploited in network evaluations. With the design space exploration method discussed above, we develop SPRING, a sparsity-aware reduced-precision CNN accelerator architecture for both training and inference. We use a binary mask scheme to encode sparsity of activations and weights, and adopt the stochastic rounding algorithm to train CNNs with reduced precision without accuracy loss. We use the efficient monolithic 3D nonvolatile memory interface to alleviate the memory bottleneck of CNN evaluation, especially in training. The last research direction of this thesis focuses on analyzing timing, leakage power, and dynamic power of FinFET architectures under process, supply voltage, and temperature (PVT) variations. We propose a statistical optimization framework using dual device-type assignment at the architecture level under PVT variations that takes spatial correlations into account and leverages circuit-level statistical analysis techniques.
URI:	http://arks.princeton.edu/ark:/88435/dsp01r781wj908
Alternate format:	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Electrical Engineering

Files in This Item:

File	Description	Size	Format
Yu_princeton_0181D_13027.pdf		13.21 MB	Adobe PDF	View/Download

Show full item record

Search

Browse