Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp011831cn73t
Title: | From Pixels to Scenes: Recovering 3D Geometry and Semantics for Indoor Environments |
Authors: | Zhang, Yinda |
Advisors: | Funkhouser, Funkhouser A |
Contributors: | Computer Science Department |
Keywords: | 3D geometry Computer vision Deep learning Indoor environment Scene understanding Semantic |
Subjects: | Computer science |
Issue Date: | 2018 |
Publisher: | Princeton, NJ : Princeton University |
Abstract: | Understanding the 3D geometry and semantics of real environments is in critically high demand for many applications, such as autonomous driving, robotics, and augmented reality. However, it is extremely challenging due to imperfect and noisy measurements from real sensors, limited access to ground truth data, and cluttered scenes exhibiting heavy occlusions and intervening objects. To address these issues, this thesis introduces a series of works that produce a geometric and semantic understanding of the scene in both pixel-wise and holistic 3D representations. Starting from estimating a depth map, which is a fundamental task in many approaches for reconstructing the 3D geometry of the scene, we introduce a learning-based active stereo system that is trained in a self-supervised fashion and reduces the disparity error to 1/10th of other canonical stereo systems. To handle a more common case where only one input image is available for scene understanding, we create a high-quality synthetic dataset facilitating pre-training of data-driven approaches, and demonstrating that we can improve the surface normal estimation and improve raw depth measurements from commodity RGBD sensors. Lastly, we pursue holistic 3D scene understanding by estimating a 3D representation of the scene, in which objects and room layout are represented using 3D bounding box and planar surfaces respectively. We propose methods to produce such a representation from either a single color panorama or a depth image, leveraging scene context. On the whole, these proposed methods produce understanding of both 3D geometry and semantics from the most fine-grained pixel level to the holistic scene scale, building foundations that support future work in 3D scene understanding. |
URI: | http://arks.princeton.edu/ark:/88435/dsp011831cn73t |
Alternate format: | The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu |
Type of Material: | Academic dissertations (Ph.D.) |
Language: | en |
Appears in Collections: | Computer Science |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Zhang_princeton_0181D_12803.pdf | 79.02 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.