Track: Orals 4B 3D Vision

Thu 20 June 13:00 - 13:18 PDT

Oral #1

SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes

Alexandros Delitzas · Ayça Takmaz · Federico Tombari · Robert Sumner · Marc Pollefeys · Francis Engelmann

Existing 3D scene understanding methods are heavily focused on 3D semantic and instance segmentation. However, identifying objects and their parts only constitutes an intermediate step towards a more fine-grained goal, which is effectively interacting with the functional interactive elements (e.g., handles, knobs, buttons) in the scene to accomplish diverse tasks. To this end, we introduce SceneFun3D, a large-scale dataset with more than 14.8k highly accurate interaction annotations for 710 high-resolution real-world 3D indoor scenes. We accompany the annotations with motion parameter information, describing how to interact with these elements, and a diverse set of natural language descriptions of tasks that involve manipulating them in the scene context. To showcase the value of our dataset, we introduce three novel tasks, namely functionality segmentation, task-driven affordance grounding and 3D motion estimation, and adapt existing state-of-the-art methods to tackle them. Our experiments show that solving these tasks in real 3D scenes remains challenging despite recent progress in closed-set and open-set 3D scene understanding methods.

Thu 20 June 13:18 - 13:36 PDT

Oral #2

SpiderMatch: 3D Shape Matching with Global Optimality and Geometric Consistency

Paul Roetzer · Florian Bernard

Finding shortest paths on product spaces is a popular approach to tackle numerous variants of matching problems, including the dynamic time warping method for matching signals, the matching of curves, or the matching of a curve to a 3D shape. While these approaches admit the computation of globally optimal solutions in polynomial time, their natural generalisation to 3D shape matching is widely known to be intractable. In this work we address this issue by proposing a novel path-based formalism for 3D shape matching. More specifically, we consider an alternative shape discretisation in which one of the 3D shapes (the source shape) is represented as a SpiderCurve, i.e. a long self-intersecting curve that traces the 3D shape surface. We then tackle the 3D shape matching problem as finding a shortest path in the product graph of the SpiderCurve and the target 3D shape. Our approach introduces a set of novel constraints that ensure a globally geometrically consistent matching. Overall, our formalism leads to an integer linear programming problem for which we experimentally show that it can efficiently be solved to global optimality. We demonstrate that our approach is competitive with recent state-of-the-art shape matching methods, while in addition guaranteeing geometric consistency.

Thu 20 June 13:36 - 13:54 PDT

Oral #3

PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness

Anh-Quan Cao · Angela Dai · Raoul de Charette

We propose the task of Panoptic Scene Completion~(PSC) which extends the recently popular Semantic Scene Completion (SSC) task with instance-level information to produce a richer understanding of the 3D scene. Our PSC proposal utilizes a hybrid mask-based technique on the non-empty voxels from sparse multi-scale completions. Whereas the SSC literature overlooks uncertainty which is critical for robotics applications, we instead propose an efficient ensembling to estimate both voxel-wise and instance-wise uncertainties along PSC. This is achieved by building on a multi-input multi-output (MIMO) strategy, while improving performance and yielding better uncertainty for little additional compute. Additionally, we introduce a technique to aggregate permutation-invariant mask predictions. Our experiments demonstrate that our method surpasses all baselines in both Panoptic Scene Completion and uncertainty estimation on three large-scale autonomous driving datasets. Our code and data will be made public.

Thu 20 June 13:54 - 14:12 PDT

Oral #4

PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar

Tzofi Klinghoffer · Xiaoyu Xiang · Siddharth Somasundaram · Yuchen Fan · Christian Richardt · Ramesh Raskar · Rakesh Ranjan

3D reconstruction from a single-view is challenging because of the ambiguity from monocular cues and lack of information about occluded regions. Neural radiance fields (NeRF), while popular for view synthesis and 3D reconstruction, are typically reliant on multi-view images. Existing methods for single-view 3D reconstruction with NeRF rely on either data priors to hallucinate views of occluded regions, which may not be physically accurate, or shadows observed by RGB cameras, which are difficult to detect in ambient light and low albedo backgrounds. We propose using time-of-flight data captured by a single-photon avalanche diode to overcome these limitations. Our method models two-bounce optical paths with NeRF, using lidar transient data for supervision. By leveraging the advantages of both NeRF and two-bounce light measured by lidar, we demonstrate that we can reconstruct visible and occluded geometry without data priors or reliance on controlled ambient lighting or scene albedo. In addition, we demonstrate improved generalization under practical constraints on sensor spatial- and temporal-resolution. We believe our method is a promising direction as single-photon lidars become ubiquitous on consumer devices, such as phones, tablets, and headsets.

Thu 20 June 14:12 - 14:30 PDT

Oral #5

A Subspace-Constrained Tyler's Estimator and its Applications to Structure from Motion

Feng Yu · Teng Zhang · Gilad Lerman

We present the subspace-constrained Tyler's estimator (STE) designed for recovering a low-dimensional subspace within a dataset that may be highly corrupted with outliers. STE is a fusion of the Tyler's M-estimator (TME) and a variant of the fast median subspace, offering superior computational efficiency compared to TME. Our theoretical analysis suggests that, under a common inlier-outlier model, STE can effectively recover the underlying subspace, even when it contains a smaller fraction of inliers relative to other methods in the field of robust subspace recovery. We apply STE in the context of Structure from Motion (SfM) in two ways: for robust estimation of the fundamental matrix and for the removal of outlying cameras, enhancing the robustness and speed of the SfM pipeline. Numerical experiments confirm the state-of-the-art performance of our method in these applications. This research makes significant contributions to the field of robust subspace recovery, particularly in the context of computer vision and 3D reconstruction.

Main Navigation

Oral Session

Orals 4B 3D Vision

SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes

SpiderMatch: 3D Shape Matching with Global Optimality and Geometric Consistency

PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness

PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar

A Subspace-Constrained Tyler's Estimator and its Applications to Structure from Motion