Skip to yearly menu bar Skip to main content


Show Detail
Timezone: America/Denver
 
Filter Rooms:  

TUE 2 JUN
2 p.m.

WED 3 JUN
7 a.m.
Break:
(ends 9:00 AM)
9 a.m.
Workshop:
(ends 1:00 PM)
Workshop:
(ends 1:00 PM)
Workshop:
(ends 1:00 PM)
Workshop:
(ends 1:00 PM)
Workshop:
(ends 1:00 PM)
Workshop:
(ends 1:00 PM)
Workshop:
(ends 1:00 PM)
Workshop:
(ends 1:00 PM)
Workshop:
(ends 6:00 PM)
Workshop:
(ends 6:00 PM)
Workshop:
(ends 6:00 PM)
Workshop:
(ends 6:00 PM)
Workshop:
(ends 6:00 PM)
10 a.m.
Break:
(ends 11:00 AM)
1 p.m.
Workshop:
(ends 6:00 PM)
Workshop:
(ends 6:00 PM)
Workshop:
(ends 6:00 PM)
Workshop:
(ends 6:00 PM)
Workshop:
(ends 6:00 PM)
3 p.m.
Break:
(ends 4:00 PM)

THU 4 JUN
7 a.m.
Break:
(ends 9:00 AM)
9 a.m.
Workshop:
(ends 1:00 PM)
Workshop:
(ends 1:00 PM)
Workshop:
(ends 1:00 PM)
Workshop:
(ends 1:00 PM)
Workshop:
(ends 6:00 PM)
Workshop:
(ends 6:00 PM)
Workshop:
(ends 6:00 PM)
10 a.m.
Break:
(ends 11:00 AM)
1 p.m.
Workshop:
(ends 6:00 PM)
Workshop:
(ends 6:00 PM)
Workshop:
(ends 6:00 PM)
Workshop:
(ends 6:00 PM)
Workshop:
(ends 6:00 PM)
3 p.m.
Break:
(ends 4:00 PM)

FRI 5 JUN
7 a.m.
Break:
(ends 9:00 AM)
8:30 a.m.
Remarks:
(ends 9:00 AM)
8:45 a.m.
Poster Setup:
(ends 9:15 AM)
9 a.m.
Break:
(ends 9:15 AM)
9:15 a.m.
Orals 9:15-10:30
[9:15] A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space
[9:30] ANTS: Adaptive Negative Textual Space Shaping for OOD Detection via Test-Time MLLM Understanding and Reasoning
[9:45] ARGUS: Defending Against Multimodal Indirect Prompt Injection via Steering Instruction-Following Behavior
[10:00] TEAR: Temporal-aware Automated Red-teaming for Text-to-Video Models
[10:15] ViT^3: Unlocking Test-Time Training in Vision
(ends 10:30 AM)
Orals 9:15-10:30
[9:15] Customized Fusion: A Closed-Loop Dynamic Network for Adaptive Multi-Task-Aware Infrared-Visible Image Fusion
[9:27] Dual Band Thermal Videography: Separating Time-Varying Reflection and Emission Near Ambient Conditions
[9:40] MetaSpectra+: A Compact Broadband Metasurface Camera for Snapshot Hyperspectral+ Imaging
[9:52] Spectrum from Defocus: Fast Spectral Imaging with Chromatic Focal Stack
[10:05] Towards Photorealistic and Efficient Bokeh Rendering via Diffusion Framework
[10:17] UnReflectAnything: RGB-Only Highlight Removal by Rendering Synthetic Specular Supervision
(ends 10:30 AM)
Orals 9:15-10:30
[9:15] Black-box Membership Inference Attacks on the Pre-training Data of Image-generation Models
[9:27] Data Leakage Detection and De-duplication in Large Scale Geospatial Image Datasets
[9:40] RAVEN: Erasing Invisible Watermarks via Novel View Synthesis
[9:52] LDP-Slicing: Local Differential Privacy for Images via Randomized Bit-Plane Slicing
[10:05] NOWA: Null-space Optical Watermark for Invisible Capture Fingerprinting and Tamper Localization
[10:17] Revisiting Geometric Obfuscation with Dual Convergent Lines for Privacy-Preserving Image Queries in Visual Localization
(ends 10:30 AM)
Orals 9:15-10:30
[9:15] Advancing Image Classification with Discrete Diffusion Classification Modeling
[9:27] Does YOLO Really Need to See Every Training Image in Every Epoch?
[9:40] Fine-grained Image Aesthetic Assessment: Learning Discriminative Scores from Relative Ranks
[9:52] NuWa: Deriving Lightweight Class-Specific Vision Transformers for Edge Devices
[10:05] Plant Taxonomy Meets Plant Counting: A Fine-Grained, Taxonomic Dataset for Counting Hundreds of Plant Species
[10:17] Rethinking Dataset Distillation: Hard Truths about Soft Labels
(ends 10:30 AM)
10:15 a.m.
Poster Setup:
(ends 10:45 AM)
10:45 a.m.
Posters 10:45-12:45
(ends 12:45 PM)
Demonstration:
(ends 12:45 PM)
Break:
(ends 11:30 AM)
1 p.m.
Orals 1:00-2:15
[1:00] 4D Primitive-Mâché: Glueing Primitives for Persistent 4D Scene Reconstruction
[1:12] Efficiently Reconstructing Dynamic Scenes One D4RT at a Time
[1:25] FUSER: Feed-Forward Multiview 3D Registration Transformer and SE(3)^N Diffusion Refinement
[1:37] Residual Primitive Fitting of 3D Shapes with SuperFrusta
[1:50] SmokeSVD: Smoke Reconstruction from A Single View via Progressive Novel View Synthesis and Refinement with Diffusion Models
[2:02] SparseWorld-TC: Trajectory-Conditioned Sparse Occupancy World Model
(ends 2:15 PM)
Orals 1:00-2:15
[1:00] 3DReflecNet: A Large-Scale Dataset for 3D Reconstruction of Reflective, Transparent, and Low-Texture Objects
[1:12] GLINT: Modeling Scene-Scale Transparency via Gaussian Radiance Transport
[1:25] Neural Field-Based 3D Surface Reconstruction of Microstructures from Multi-Detector Signals in Scanning Electron Microscopy
[1:37] PhyGaP: Physically-Grounded Gaussians with Polarization Cues
[1:50] PPISP: Physically-Plausible Compensation and Control of Photometric Variations in Radiance Field Reconstruction
[2:02] SeeGroup: Multi-Layer Depth Estimation of Transparent Surfaces via Self-Determined Grouping
(ends 2:15 PM)
Orals 1:00-2:15
[1:00] MAMMA: Markerless Accurate Multi-person Motion Acquisition
[1:12] Natural Human Motion Recovery by Aligning High-Order Temporal Dynamics from Monocular Videos
[1:25] PoseGAM: Robust Unseen Object Pose Estimation via Geometry-Aware Multi-View Reasoning
[1:37] SAM 3D Body: Robust Full-Body Human Mesh Recovery
[1:50] SAM 3D: 3Dfy Anything in Images
[2:02] SPARK: Sim-ready Part-level Articulated Reconstruction with VLM Knowledge
(ends 2:15 PM)
Orals 1:00-2:15
[1:00] Energy-GS: Image Energy-guided Pose Alignment Gaussian Splatting with redesigned pose gradient flow
[1:12] MeshSplatting: Differentiable Rendering with Opaque Meshes
[1:25] Proxy-GS: Unified Occlusion Priors for Training and Inference in Structured 3D Gaussian Splatting
[1:37] RetimeGS: Continuous-Time Reconstruction of 4D Gaussian Splatting
[1:50] Selfi: Self-improving Reconstruction Engine via 3D Geometric Feature Alignment
[2:02] Z-Order Transformer for Feed-Forward Gaussian Splatting
(ends 2:15 PM)
1:30 p.m.
(ends 2:30 PM)
2:15 p.m.
Break:
(ends 2:30 PM)
2:45 p.m.
3:30 p.m.
Poster Setup:
(ends 4:00 PM)
4 p.m.
Demonstration:
(ends 6:00 PM)
Posters 4:00-6:00
(ends 6:00 PM)

SAT 6 JUN
7:30 a.m.
Break:
(ends 9:00 AM)
9 a.m.
Orals 9:00-10:15
[9:00] Breaking Semantic Boundaries: Distribution-Guided Semantic Exploration for Creative Generation
[9:12] Guiding a Diffusion Model by Swapping Its Tokens
[9:25] PixelDiT: Pixel Diffusion Transformers for Image Generation
[9:37] SeaCache: Spectral-Evolution-Aware Cache for Accelerating Diffusion Models
[9:50] SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching
[10:02] Streaming Diffusion Model for Fast Infrared and Visible Video Fusion
(ends 10:15 AM)
Orals 9:00-10:15
[9:00] Differentiable Vector Quantization for Rate-Distortion Optimization of Generative Image Compression
[9:12] FINER: MLLMs Hallucinate under Fine-grained Negative Queries
[9:25] MDCS-MoAME: Multi-directional Composite Scanning with Mixture of Attention and Mamba Experts for Cancer Survival Prediction
[9:37] PAS: A Training-Free Stabilizer for Temporal Encoding in Video LLMs
[9:50] PAVAS: Physics-Aware Video-to-Audio Synthesis
[10:02] ProPhy: Progressive Physical Alignment for Dynamic World Simulation
(ends 10:15 AM)
Orals 9:00-10:15
[9:00] ComPose: A Unified Completion-Pose Framework for Robust Category-Level Object Pose Estimation
[9:12] CoSMo3D: Open-World Promptable 3D Semantic Segmentation through LLM-Guided Canonical Spatial Modeling
[9:25] GeoViS: Geospatially Rewarded Visual Search for Remote Sensing Visual Grounding
[9:37] RobotSeg: A Model and Dataset for Segmenting Robots in Image and Video
[9:50] S^2AM3D: Scale-controllable Part Segmentation of 3D Point Clouds
[10:02] Scalable Multi-View Subspace Clustering with Tensorized Anchor Guidance
(ends 10:15 AM)
Orals 9:00-10:15
[9:00] 3D-LATTE: Latent Space 3D Editing from Textual Instructions
[9:12] AnchorFlow: Training-Free 3D Editing via Latent Anchor-Aligned Flows
[9:25] ChordEdit: One-Step Low-Energy Transport for Image Editing
[9:37] Faithful Contouring: Near-Lossless 3D Voxel Representation Free from Iso-surface
[9:50] Native and Compact Structured Latents for 3D Generation
[10:02] SliderEdit: Continuous Image Editing with Fine-Grained Instruction Control
(ends 10:15 AM)
10:15 a.m.
Break:
(ends 10:30 AM)
10:30 a.m.
11:15 a.m.
Poster Setup:
(ends 11:45 AM)
11:45 a.m.
Posters 11:45-1:45
(ends 1:45 PM)
Demonstration:
(ends 1:45 PM)
Doctoral Consortium:
(ends 1:45 PM)
1:45 p.m.
(ends 2:45 PM)
2 p.m.
Orals 2:00-3:15
[2:00] INSID3: Training-Free In-Context Segmentation with DINOv3
[2:12] MARCO: Navigating the Unseen Space of Semantic Correspondence
[2:25] PR-MaGIC: Prompt Refinement Via Mask Decoder Gradient Flow For In-Context Segmentation
[2:37] R^2-Seg: Training-Free OOD Medical Tumor Segmentation via Anatomical Reasoning and Statistical Rejection
[2:50] The SA-FARI Dataset: Segment Anything in Footage of Animals for Recognition and Identification
[3:02] VGGT-Segmentor: Geometry-Enhanced Cross-View Segmentation
(ends 3:15 PM)
Orals 2:00-3:15
[2:00] Chorus: Multi-Teacher Pretraining for Holistic 3D Gaussian Scene Encoding
[2:12] Featurising Pixels from Dynamic 3D Scenes with Linear In-Context Learners
[2:25] From Pairs to Sequences: Track-Aware Policy Gradients for Keypoint Detection
[2:37] Linear Fundamental Matrix Estimation from 7 or 5 Points
[2:50] OccuFly: A 3D Vision Benchmark for Semantic Scene Completion from the Aerial Perspective
[3:02] VGGT-Ω
(ends 3:15 PM)
Orals 2:00-3:15
[2:00] CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization
[2:12] NitroGen: An Open Foundation Model for Generalist Gaming Agents
[2:25] PAI-Bench: A Comprehensive Benchmark For Physical AI
[2:37] RefAV: Towards Planning-Centric Scenario Mining
[2:50] SoccerMaster: A Vision Foundation Model for Soccer Understanding
[3:02] VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments
(ends 3:15 PM)
Orals 2:00-3:15
[2:00] Breaking the Scalability Limit of Multi-Projector Calibration with Embedded Cameras
[2:12] GaussianFluent: Gaussian Simulation for Dynamic Scenes with Mixed Materials
[2:25] InfiniBench: Infinite Benchmarking for Visual Spatial Reasoning with Customizable Scene Complexity
[2:37] MAGICIAN: Efficient Long-Term Planning with Imagined Gaussians for Active Mapping
[2:50] Memory-Augmented Scene Understanding and Exploration for Open-World Aerial Object-Goal Navigation
[3:02] Monocular Open Vocabulary Occupancy Prediction for Indoor Scenes
(ends 3:15 PM)
3:15 p.m.
Break:
(ends 3:30 PM)
4:15 p.m.
Poster Setup:
(ends 4:45 PM)
4:45 p.m.
Posters 4:45-6:45
(ends 6:45 PM)
Demonstration:
(ends 6:45 PM)
7 p.m.
Reception:
(ends 9:00 PM)

SUN 7 JUN
7:30 a.m.
Break:
(ends 9:00 AM)
9 a.m.
Orals 9:00-10:15
[9:00] AT-VLA: Adaptive Tactile Injection for Enhanced Feedback Reaction in Vision-Language-Action Models
[9:12] Learning Diffeomorphism for Medical Image Registration with Time-Embedded Architectures Using Semigroup Regularization
[9:25] QuadSync: Quadrifocal Tensor Synchronization via Tucker Decomposition
[9:37] SocialNav: Training Human-Inspired Foundation Model for Socially-Aware Embodied Navigation
[9:50] Structural Action Transformer for 3D Dexterous Manipulation
[10:02] TESO: Online Tracking of Essential Matrix by Stochastic Optimization
(ends 10:15 AM)
Orals 9:00-10:15
[9:00] Evidential Neural Radiance Fields
[9:12] Global-Aware Edge Prioritization for Pose Graph Initialization
[9:25] Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding
[9:37] Optical Flow Matching: Reframing Optical Flow as Continuous Transport Dynamics
[9:50] SEATrack: Simple, Efficient, and Adaptive Multimodal Tracker
[10:02] U^2Flow: Uncertainty-Aware Unsupervised Optical Flow Estimation
(ends 10:15 AM)
Orals 9:00-10:15
[9:00] AToken: A Unified Tokenizer for Vision
[9:12] Confusion-Aware Spectral Regularizer for Long-Tailed Recognition
[9:25] Learning Latent Concepts for Detecting Out-of-Distribution Objects
[9:37] Learning Like Humans: Analogical Concept Learning for Generalized Category Discovery
[9:50] Understanding and Enforcing Weight Disentanglement in Task Arithmetic
[10:02] Understanding Task Transfer in Vision-Language Models
(ends 10:15 AM)
Orals 9:00-10:15
[9:00] BoostSLT: Boosting Sign Language Translation via a Plug-and-Play Diffusion-Based Semantic Enhancer
[9:12] ImmerIris: A Large-Scale Dataset and Benchmark for Off-Axis and Unconstrained Iris Recognition in Immersive Applications
[9:25] OLATverse: A Large-scale Real-world Object Dataset with Precise Lighting Control
[9:37] OpenDance: Multimodal Controllable 3D Dance Generation with Large-scale Internet Data
[9:50] POLAR: A Portrait OLAT Dataset and Generative Framework for Illumination-Aware Face Modeling
[10:02] Relightable Holoported Characters: Capturing and Relighting Dynamic Human Performance from Sparse Views
(ends 10:15 AM)
10:15 a.m.
Break:
(ends 10:30 AM)
10:30 a.m.
11:15 a.m.
Poster Setup:
(ends 11:45 AM)
11:45 a.m.
Demonstration:
(ends 1:45 PM)
Posters 11:45-1:45
(ends 1:45 PM)
2 p.m.
Orals 2:00-3:15
[2:00] Efficient Unrolled Networks for Large-Scale 3D Inverse Problems
[2:15] FedAdamom: Adaptive Momentum for Improved Generalization in Federated Optimization
[2:30] SimScale: Learning to Drive via Real-World Simulation at Scale
[2:45] Texvent: Asynchronous Event Data Simulation via Text Prompt
[3:00] WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World
(ends 3:15 PM)
Orals 2:00-3:15
[2:00] CineBrain: A Large-Scale Multi-Modal Audiovisual Brain Dataset for Brain-Conditioned Video Generation
[2:15] Hearing the Room Through the Shape of the Drum: Modal-Guided Sound Recovery from Multi-Point Surface Vibrations
[2:30] SDTrack: A Baseline for Event-based Tracking via Spiking Neural Networks
[2:45] Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding
[3:00] Wan-Weaver: Interleaved Multi-modal Generation via Decoupled Training
(ends 3:15 PM)
Orals 2:00-3:15
[2:00] CURE: Curriculum-guided Multi-task Training for Reliable Anatomy Grounded Report Generation
[2:12] DK-DDIL: Adaptive Knowledge Retention for Dynamic Domain-Incremental Learning in Medical Imaging
[2:25] Dual-level Adapter Boosting Prompt-free Curvilinear Structure Segmentation
[2:37] LATA: Laplacian-Assisted Transductive Adaptation for Conformal Uncertainty in Medical VLMs
[2:50] Medic-AD: Towards Medical Vision-Language Model's Clinical Intelligence
[3:02] SegMoTE: Token-Level Mixture of Experts for Medical Image Segmentation
(ends 3:15 PM)
Orals 2:00-3:15
[2:00] Differentiable Laplacian Matrix Guided Superpixel Segmentation
[2:15] FILTR: Extracting Topological Features from Pretrained 3D Models
[2:30] Learning Convex Decomposition via Feature Fields
[2:45] Learning Eigenstructures of Unstructured Data Manifolds
[3:00] Mapping Networks
(ends 3:15 PM)
3 p.m.
Poster Setup:
(ends 3:30 PM)
3:15 p.m.
Break:
(ends 3:30 PM)
3:30 p.m.
Posters 3:30-5:30
(ends 5:30 PM)