All You Need To Know About Self-Driving
Autonomous driving has evolved into a complex, full-stack problem that integrates perception, prediction, planning, simulation, and safety within a unified system. This tutorial provides a comprehensive overview of modern self-driving pipelines, covering both traditional modular approaches and emerging end-to-end paradigms. It reviews key components including sensor systems, multi-modal perception, motion forecasting, planning and control, large-scale simulation, and data-centric development. The tutorial further highlights recent advances such as foundation models, generative simulation, and real-world deployment at scale, offering a unified perspective on current challenges and future directions in self-driving systems.
Recent Advances in AI for Medical Imaging: Progress, Challenges, and Future Directions
Artificial intelligence has driven significant advances in medical imaging, improving tasks such as image reconstruction, diagnosis, and clinical decision support across modalities including MRI, CT, X-ray, and pathology. This tutorial provides an up-to-date overview of key paradigms in the field, including physics-informed learning, medical foundation models, and collaborative approaches such as federated and multi-agent systems. It examines major challenges such as generalization, interpretability, data heterogeneity, and privacy constraints, while highlighting emerging solutions and open research directions. The tutorial aims to offer a comprehensive perspective on the development and deployment of reliable, clinically relevant AI systems for medical imaging.
Extending Computer Vision to Hidden Objects: A Tutorial on Millimeter-Wave Imaging and Reconstruction of Occluded Scenes
Millimeter-wave (mmWave) sensing is emerging as a new modality for computer vision, enabling perception of objects and scenes that are occluded or invisible to traditional cameras. This tutorial introduces the fundamentals of mmWave imaging, highlighting how its physical properties enable through-occlusion sensing and all-weather perception. It covers both classical signal-processing approaches and recent learning-based methods for 3D reconstruction, segmentation, and scene understanding. The tutorial further provides practical guidance on datasets, tools, and open challenges, offering a comprehensive entry point for researchers interested in extending vision systems beyond visible light.
Analytic understanding of diffusion models
Diffusion models achieve state-of-the-art performance in generative modeling, yet their theoretical foundations and generalization behavior remain poorly understood. This tutorial focuses on the analytical understanding of diffusion models, addressing the apparent paradox between closed-form optimal denoisers and the empirical success of deep diffusion networks. It introduces recent theoretical advances that explain how mechanisms such as score smoothing, training dynamics, neural network inductive biases, and data structure contribute to generalization. By combining mathematical insights with hands-on experiments, the tutorial provides a principled framework for understanding the inner workings of diffusion models and for interpreting recent developments in the field.
The Full Stack of Physical AI: Simulation, Foundation Models, and Edge Deployment for Next-Generation Robotics Applications
Physical AI systems, including robotics and autonomous platforms, require tightly integrated pipelines spanning data collection, model training, and real-time deployment. This tutorial presents a full-stack perspective on building such systems, covering simulation-based data generation, foundation models for robot control, and deployment on edge hardware. It introduces practical workflows using modern tools for human-in-the-loop data collection, multimodal robot foundation models, and hardware-aware optimization for low-latency inference. The tutorial further highlights challenges in scaling and deploying physical AI systems, providing attendees with actionable guidance and open-source resources for end-to-end robotics development.
Computer Vision at Scale: Multi-Camera Tracking, Calibration, and Event Detection for Checkout-Free Retail
Large-scale multi-camera systems are central to real-world computer vision applications, yet their design is shaped as much by infrastructure constraints as by algorithmic advances. This tutorial presents a unified perspective on multi-camera vision through the lens of checkout-free retail, focusing on three core components: automatic camera calibration, real-time multi-object tracking, and structured event detection. It examines how challenges such as asynchrony, partial observability, hardware failures, and edge deployment constraints influence system design and performance. The tutorial further highlights generalizable principles for building reliable, scalable vision systems, bridging the gap between academic methods and real-world deployment.
3D Geometry Generation for Scientific Computing (2nd Edition)
Third Workshop for Learning 3D with Multi-View Supervision
6th Workshop on 3D Scene Understanding for Vision, Graphics, and Robotics
Workshop on Any-to-any Multimodal Learning
The 3rd Workshop on New Trends in AI-Generated Media and Security
6th Workshop on CV4Animals: Computer Vision for Animal Behavior Tracking and Modeling
2nd Workshop on Computer Vision for Children
The 5th Workshop on Computer Vision in the Wild: Towards Unified Multimodal Agents For Reasoning in the Wild
The Second Workshop on the Evaluation of the Generative Foundation Models
Geometry-Free Novel View Synthesis and Controllable Video Models
Humans of Generative AI
2nd Workshop on Knowledge-Intensive Multimodal Reasoning
The 1st Workshop on Low‑Level Vision Frontiers with Generative AI, Preference Optimization, and Agentic Systems
Multimodal Algorithmic Reasoning Workshop
Exploring the Next Generation of Data
6th Omnidirectional Computer Vision Workshop
Open-World Vision
Open-World Vision (OWV) emphasizes realistic opportunities and challenges in developing and deploying computer vision systems in the dynamic, vast, and unpredictable real open world, which offers abundant data that can benefit training and challenge testing. It contrasts the traditional "closed-world" paradigm of visual learning and inference, which assumes fixed, known data distributions and categorical labels. Models developed under such closed-world assumptions tend to be brittle when encountering ever-changing and novel scenarios in the real open world. Modern visual learning has shifted towards an open-world paradigm, such as pretraining foundation models on massive data sourced from the open world (e.g., web-sourced data). While these models show unprecedented performance and strong adaptability to downstream tasks, they inherit biases from their open-world pretraining data and can still fail in truly novel or underrepresented scenarios during deployment. This workshop aims not only to uncover current limitations, potential risks, emerging opportunities, and unresolved challenges of open-world vision, but also to solicit solutions that advance the field toward more robust, fair, and adaptable visual systems.
Personalization in Generative AI Workshop
PhysHuman: Physically Grounded Human Perception and Modeling
From Perception to Persuasion: Challenges and Advances in Misinformation Detection in Society
The Eighth Workshop on Precognition: Seeing through the Future
Safe Artificial Intelligence for All Domains
SPAR-3D: Security, Privacy, and Adversarial Robustness in 3D Generative Vision Models
Trustworthy, Robust, Uncertainty-Aware, and Explainable Visual Intelligence and Beyond
The 8th UG2+ Workshop and Challenge: Bridging the Gap between Computational Photography and Visual Perception
Unified Robotic Vision with Cross-Modal Sensing and Alignment
VizWiz Grand Challenge: Interpreting Images and Videos Taken by Blind People
9th International Workshop on Visual Odometry and Computer Vision Applications Based on Location Clues
4th Workshop on Maritime Computer Vision
The 6th Workshop of Adversarial Machine Learning on Computer Vision: Safety of Vision-Language Agents
11th Workshop on Computer Vision and Multimodal Microscopy Image Analysis
12th IEEE International Workshop on Computer Vision in Sports
The Seventh Annual Embodied Artificial Intelligence Workshop
EarthVision: Large Scale Computer Vision for Remote Sensing Imagery
Embodied Reasoning in Action: Workshop and Challenge on Embodied Reasoning for Robotic Manipulation
4th Workshop on Generative Models for Computer Vision
2nd Workshop on Agents in Interaction, from Humans to Robots
2nd Workshop on Human-Interactive Generation and Editing
How Do Vision Models Work?
Mobile AI workshop and associated challenges, 6th edition
Multi-Agent Embodied Intelligent Systems Meet Agentic-AI era: Opportunities, Challenges and Futures
9th Multimodal Learning and Applications Workshop
11th New Trends in Image Restoration and Enhancement Workshop and Challenges
Video Generative Models: Benchmarks and Evaluation
2nd Workshop on Video Large Language Models
Workshop on Visual Concepts
Sight and Sound
1st Workshop on Generative 3D Reconstruction
Medical Reasoning with Vision Language Foundation Models
4D Digital Twins: Real-to-Sim-to-Real for Physical AI
2nd Workshop on 4D Vision: Modeling the Dynamic World
The Third Workshop on Anomaly Detection with Foundation Models
Artificial Intelligence for Space
2nd Workshop on GenAI for Storytelling
Appearance Understanding and Generation
Big Model Adaptation In Computer Vision
CVPR 2026 Biometrics Workshop
Bridging AI and Medical Reality: Computer Vision for Real-world Clinical Translation
Computer Vision × Education: Building a Cross‑Community Agenda for Multimodal Vision in Classrooms
CV4Science: Using Computer Vision for the Sciences
Domain Generalization: Evolution, Breakthroughs, and Future Horizons (2nd Edition)
The 2nd CVPR Workshop on Foundation Models Meet Embodied Agents
The 7th International Workshop on Eye and Gaze in Computer Vision
8th International Workshop on Large Scale Holistic Video Understanding
Eighth Workshop on Image Matching: Local Features and Beyond
1st Workshop on Journey to the Awards: Generative AI for Movie-Grade Video Production (J2A), CVPR 2026
6th International Workshop on Long-form Video Understanding, Generation and Action
The 2nd Workshop on Multi-Modal Reasoning for Agentic Intelligence
Pixel-level Video Understanding in the Wild Challenge
4D World Models: Bridging Generation and Reconstruction
Third Workshop on Simulation for Autonomous Driving
Second Workshop on Skilled Activity Understanding, Assessment & Feedback Generation
Imagine a world where computer vision-based systems can analyze a video of an athlete, a surgeon, a patient, or a factory worker and instantly provide expert-level actionable feedback---correcting techniques, identifying inefficiencies, and helping people refine their skills in real time. Thanks to rapid progress in video understanding, this vision is becoming reality. AI-powered systems can now analyze complex human activities, assess performance, and generate intelligent feedback, unlocking new possibilities in sports, healthcare, manufacturing, education, rehabilitation, and beyond. Through Expert Keynotes and Invited Contributions, this CVPR 2026 workshop will explore the cutting edge of skilled activity understanding, assessment, and feedback generation, bridging research and real-world applications.
As AI systems become more capable of understanding human expertise, the implications are profound---empowering individuals with personalized coaching, democratized skill development, and scalable training solutions. We invite researchers, industry leaders, and practitioners to join us in shaping the future of AI-powered skill understanding. Whether working on foundational research, applied solutions, or real-world deployment, this workshop is an opportunity and forum to learn about and push the boundaries of how AI perceives, evaluates, and enhances human ability.
ScaleBot: The First Workshop on Scalable Robot Learning Systems
The 3rd Workshop on Synthetic Data for Computer Vision
Visual Anomaly and Novelty Detection - 4th Edition
From Perception to Action: Building Efficient and Deployable Robot Intelligence Pipelines with Open-Source Edge AI Toolkits
Robotic manipulation has become a key application of embodied AI, but many research pipelines remain difficult to reproduce and deploy in real-world systems. This tutorial presents an end-to-end, open-source workflow for building efficient robot intelligence pipelines, covering data collection, visuomotor policy training, simulation, and deployment on edge hardware. It introduces practical techniques such as teleoperated data acquisition, diffusion- and transformer-based policies, and neural object cloning for simulation-ready assets. The tutorial further emphasizes model optimization and real-time deployment, culminating in a live demonstration of a complete perception-to-action pipeline on an affordable robotic platform.
Foundations and Frontiers of Watermarking: Algorithms, Multimodal Extensions, Benchmarks, and Authenticity Frameworks
Watermarking has re-emerged as a critical component of trustworthy AI, driven by the rapid growth of generative models and the need for content attribution and authenticity. This tutorial provides a unified overview of watermarking, spanning classical signal-processing foundations and modern deep-learning–based approaches across images, video, audio, and multimodal data. It examines key challenges such as robustness, capacity, and adversarial resilience, along with recent benchmarking efforts and evaluation frameworks. The tutorial further connects these methods to real-world deployment through applications in content provenance, media forensics, and emerging standards such as C2PA, offering a comprehensive perspective on building reliable and transparent media systems.
The Road to Convergence: Evolution of Unified Multimodal Models
Unified multimodal models are emerging as a new paradigm that integrates understanding and generation across modalities within a single foundation model. This tutorial provides a comprehensive overview of these models, addressing the currently fragmented landscape of architectures, representations, and training strategies. It introduces a unified perspective on key design choices, including modeling paradigms, multimodal tokenization, and alignment methods, while reviewing benchmarks and real-world applications. The tutorial further highlights open challenges such as scalable representation learning and unified world modeling, offering a structured roadmap for future research in multimodal AI.