Oral Session
Oral Session 4C: 3D Computer Vision
DORNet: A Degradation Oriented and Regularized Network for Blind Depth Super-Resolution
Zhengxue Wang · Zhiqiang Yan · Jinshan Pan · Guangwei Gao · Kai Zhang · Jian Yang
Recent RGB-guided depth super-resolution methods have achieved impressive performance under the assumption of fixed and known degradation (e.g., bicubic downsampling). However, in real-world scenarios, captured depth data often suffer from unconventional and unknown degradation due to sensor limitations and complex imaging environments (e.g., low reflective surfaces, varying illumination). Consequently, the performance of these methods significantly declines when real-world degradation deviate from their assumptions. In this paper, we propose the Degradation Oriented and Regularized Network (DORNet), a novel framework designed to adaptively address unknown degradation in real-world scenes through implicit degradation representations. Our approach begins with the development of a self-supervised degradation learning strategy, which models the degradation representations of low-resolution depth data using routing selection-based degradation regularization. To facilitate effective RGB-D fusion, we further introduce a degradation-oriented feature transformation module that selectively propagates RGB content into the depth data based on the learned degradation priors. Extensive experimental results on both real and synthetic datasets demonstrate the superiority of our DORNet in handling unknown degradations, outperforming existing methods.
Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World
Bangyan Liao · Zhenjun Zhao · Haoang Li · Yi Zhou · Yingping Zeng · Hao Li · Peidong Liu
Determining the vanishing points (VPs) in a Manhattan world, as a fundamental task in many 3D vision applications, consists of jointly inferring the line-VP association and locating each VP. Existing methods are, however, either sub-optimal solvers or pursuing global optimality at a significant cost of computing time. In contrast to prior works, we introduce convex relaxation techniques to solve this task for the first time. Specifically, we employ a “soft” association scheme, realized via a truncated multi-selection error, that allows for joint estimation of VPs’ locations and line-VP associations. This approach leads to a primal problem that can be reformulated into a quadratically constrained quadratic programming (QCQP) problem, which is then relaxed into a convex semidefinite programming (SDP) problem. To solve this SDP problem efficiently, we present a globally optimal outlier-robust iterative solver (called GlobustVP), which independently searches for one VP and its associated lines in each iteration, treating other lines as outliers. After each independent update of all VPs, the mutual orthogonality between the three VPs in a Manhattan world is reinforced via local refinement. Extensive experiments on both synthetic and real-world data demonstrate that GlobustVP achieves a favorable balance between efficiency, robustness, and global optimality compared to previous works. We will release our code for further study.
Learned Binocular-Encoding Optics for RGBD Imaging Using Joint Stereo and Focus Cues
Yuhui Liu · Liangxun Ou · Qiang Fu · Hadi Amata · Wolfgang Heidrich · YIFAN PENG
Extracting high-fidelity RGBD information from two-dimensional (2D) images is essential for various visual computing applications. Stereo imaging, as a reliable passive imaging technique for obtaining three-dimensional (3D) scene information, has benefited greatly from deep learning advancements. However, existing stereo depth estimation algorithms struggle to perceive high-frequency information and resolve high-resolution depth maps in realistic camera settings with large depth variations. These algorithms commonly neglect the hardware parameter configuration, limiting the potential for achieving optimal solutions solely through software-based design strategies.This work presents a hardware-software co-designed RGBD imaging framework that leverages both stereo and focus cues to reconstruct texture-rich color images along with detailed depth maps over a wide depth range. A pair of rank-2 parameterized diffractive optical elements (DOEs) is employed to encode perpendicular complementary information optically during stereo acquisitions. Additionally, we employ an IGEV-UNet-fused neural network tailored to the proposed rank-2 encoding for stereo matching and image reconstruction. Through prototyping a stereo camera with customized DOEs, our deep stereo imaging paradigm has demonstrated superior performance over existing monocular and stereo imaging systems in both image PSNR by 2.96 dB gain and depth accuracy in high-frequency details across distances from 0.67 to 8 meters.
Camera Resection from Known Line Pencils and a Radially Distorted Scanline
Juan Carlos Dibene Simental · Enrique Dunn
We present a marker-based geometric estimation framework for the absolute pose of a camera by analyzing the 1D observations in a single radially distorted pixel scanline.We leverage a pair of known co-planar pencils of lines, along with lens distortion parameters, to propose an ensemble of solvers exploring the space of estimation strategies applicable to our setup.First, we present a minimal algebraic solver requiring only six measurements and yielding eight solutions, which relies on the intersection of two conics defined by one of the pencils of lines.Then, we present a unique closed-form geometric solver from seven measurements.Finally, we present an homography-based formulation amenable to linear least-squares from eight or more measurements.Our geometric framework constitutes a theoretical analysis on the minimum geometric context necessary to solve in closed form for the absolute pose of a single camera from a single radially distorted scanline.
Opportunistic Single-Photon Time of Flight
Sotiris Nousias · Mian Wei · Howard Xiao · Maxx Wu · Shahmeer Athar · Kevin J Wang · Anagh Malik · David A. Barmherzig · David B. Lindell · Kiriakos Kutulakos
Scattered light from pulsed lasers is increasingly part of our ambient illumination, as many devices rely on them for active 3D sensing. In this work, we ask: can these “ambient” light signals be detected and leveraged for passive 3D vision? We show that pulsed lasers, despite being weak and fluctuating at MHz to GHz frequencies, leave a distinctive sinc comb pattern in the temporal frequency domain of incident flux that is specific to each laser and invariant to the scene. This enables their passive detection and analysis with a free-running SPAD camera, even when they are unknown, asynchronous, out of sight, and emitting concurrently. We show how to synchronize with such lasers computationally, characterize their pulse emissions, separate their contributions, and—if many are present—localize them in 3D and recover a depth map of the camera’s field of view. We use our camera prototype to demonstrate (1) a first-of-its-kind visualization of asynchronously propagating light pulses from multiple lasers through the same scene, (2) passive estimation of a laser’s MHz-scale pulse repetition frequency with mHz precision, and (3) mm-scale 3D imaging over room-scale distances by passively harvesting photons from two or more out-of-view lasers.