Workshop: Vision-Centric Autonomous Driving (VCAD) Mon 19 Jun 08:00 a.m.
4th Agriculture-Vision Workshop: Challenges & Opportunities for Computer Vision in Agriculture Mon 19 Jun 08:00 a.m.
Workshop: Multi-Agent Behavior: Properties, Computation and Emergence Mon 19 Jun 08:00 a.m.
Workshop: 3DMV: Learning 3D with Multi-View Supervision Mon 19 Jun 08:00 a.m.
7th AI City Challenge Workshop Mon 19 Jun 08:00 a.m.
Workshop: Image Matching: Local Features and Beyond Mon 19 Jun 08:00 a.m.
4th International Workshop on Event-based Vision Mon 19 Jun 08:00 a.m.
This workshop is dedicated to event-based cameras, smart cameras, and algorithms processing data from these sensors. Event-based cameras are bio-inspired sensors with the key advantages of microsecond temporal resolution, low latency, very high dynamic range, and low power consumption. Because of these advantages, event-based cameras open frontiers that are unthinkable with standard frame-based cameras (which have been the main sensing technology for the past 60 years). These revolutionary sensors enable the design of a new class of algorithms to track a baseball in the moonlight, build a flying robot with the agility of a bee, and perform structure from motion in challenging lighting conditions and at remarkable speeds. These sensors became commercially available in 2008 and are slowly being adopted in computer vision and robotics. In recent years they have received attention from large companies, e.g., the event-sensor company Prophesee collaborated with Intel and Bosch on a high spatial resolution sensor, Samsung announced mass production of a sensor to be used on hand-held devices, and they have been used in various applications on neuromorphic chips such as IBM’s TrueNorth and Intel’s Loihi. The workshop also considers novel vision sensors, such as pixel processor arrays (PPAs), which perform massively parallel processing near the image plane. Because early vision computations are carried out on-sensor, the resulting systems have high speed and low-power consumption, enabling new embedded vision applications in areas such as robotics, AR/VR, automotive, gaming, surveillance, etc. This workshop will cover the sensing hardware, as well as the processing and learning methods needed to take advantage of the above-mentioned novel cameras.
3rd Mobile AI Workshop and Challenges Mon 19 Jun 08:00 a.m.
Over the past years, mobile AI-based applications are becoming more and more ubiquitous. Various deep learning models can now be found on any mobile device, starting from smartphones running portrait segmentation, image enhancement, face recognition and natural language processing models, to smart-TV boards coming with sophisticated image super-resolution algorithms. The performance of mobile NPUs and DSPs is also increasing dramatically, making it possible to run complex deep learning models and to achieve fast runtime in the majority of tasks.
While many research works targeted at efficient deep learning models have been proposed recently, the evaluation of the obtained solutions is usually happening on desktop CPUs and GPUs, making it nearly impossible to estimate the actual inference time and memory consumption on real mobile hardware. To address this problem, we introduce the first Mobile AI Workshop, where all deep learning solutions are developed for and evaluated on mobile devices.
Due to the performance of the last-generation mobile AI hardware, the topics considered in this workshop will go beyond the simple classification tasks, and will include such challenging problems as image denoising, HDR photography, accurate depth estimation, learned image ISP pipeline, real-time image and video super-resolution. All information about the challenges, papers, invited talks and workshop industry partners is provided at: https://ai-benchmark.com/workshops/mai/2023/
5th Workshop and Competition on Affective Behavior Analysis in-the-wild Mon 19 Jun 08:00 a.m.
The Workshop has a unique aspect of fostering cross-pollination of different disciplines, bringing together experts (from academia & industry) and researchers of computer vision and pattern recognition, AI, machine learning, HCI, multimedia, robotics and psychology. The diversity of human behavior, the richness of multi-modal data that arises from its analysis, and the multitude of applications that demand rapid progress in this area ensure that our event provides a timely and relevant discussion and dissemination platform.
The workshop includes keynote talks from Prof. Gunes and Prof. Lapedriza, as well as presentations from experts and researchers within academia and industry on topics related to affective computing and behavior analysis.
The detailed agenda of the workshop can be found on the workshop's website.
The Sixth International Workshop on Computer Vision for Physiological Measurement (CVPM) Mon 19 Jun 08:00 a.m.
8th Workshop on Computer Vision for Microscopy Image Analysis Mon 19 Jun 08:00 a.m.
High-throughput microscopy enables researchers to acquire thousands of images automatically over a matter of hours. This makes it possible to conduct large-scale, image-based experiments for biological discovery. The main challenge and bottleneck in such experiments is the conversion of “big visual data” into interpretable information and hence discoveries. Visual analysis of large-scale image data is a daunting task. Cells need to be located and their phenotype (e.g., shape) described. The behaviors of cell components, cells, or groups of cells need to be analyzed. The cell lineage needs to be traced. Not only do computers have more “stamina” than human annotators for such tasks, they also perform analysis that is more reproducible and less subjective. The post-acquisition component of high-throughput microscopy experiments calls for effective and efficient computer vision techniques.
This workshop will bring together computer vision experts from academia, industry, and government who have made progress in developing computer vision tools for microscopy image analysis. It will provide a comprehensive forum on this topic and foster in-depth discussion of technical and application issues as well as cross-disciplinary collaboration. It will also serve as an introduction to researchers and students curious about this important and fertile field.
Workshop: VizWiz Grand Challenge: Describing Images and Videos Taken by Blind People Mon 19 Jun 08:15 a.m.
Workshop: First Rhobin Challenge - Reconstruction of human-object interaction Mon 19 Jun 08:20 a.m.
Tutorial: Torsten Sattler · Yannis Avrithis · Eric Brachmann · Zuzana Kukelova · Marc Pollefeys · Sudipta Sinha · Giorgos Tolias
Large-Scale Visual Localization
The tutorial covers the task of visual localization, i.e., the problem of estimating the position and orientation from which a given image was taken. The tutorial’s scope includes cases with different spatial/geographical extent, small indoor/outdoor scenes, city-level, and world-level, and localization under changing conditions. In the coarse localization regime, the task is typically handled via retrieval approaches, which is covered in the first part of the tutorial. A typical use case is the following: Given a database of geo-tagged images, the goal is to determine the place depicted in a new query image. Traditionally, this problem is solved by transferring the geo-tag of the most similar database image to the query. The major focus of this part is on the visual representation models used for retrieval, where we include both classical feature-based and recent deep learning-based approaches. The 2nd and 3rd part of the tutorial encompass methods for precise localization with features-based and deep learning approaches, respectively. A typical use-case for these algorithms is to estimate the full 6 Degree-of-Freedom (6DOF) pose of a query image, i.e., the position and orientation from which the image was taken, for applications such as robotics, autonomous vehicles (self-driving cars), Augmented / Mixed / Virtual Reality, loop closure detection in SLAM, and Structure-from-Motion. The final part will cover existing datasets, including their limitations.We provide links to publicly available source code for the discussed approaches.
Bio s:Tutorial: Linjie Li · Zhe Gan · Chunyuan Li · Jianwei Yang
Recent Advances in Vision Foundation Models
Visual understanding at different levels of granularity has been a longstanding problem in the computer vision community. The tasks span from image-level tasks (e.g., image classification, image-text retrieval, image captioning, and visual question answering), region-level localization tasks (e.g., object detection and phrase grounding), to pixel-level grouping tasks (e.g., image instance/semantic/panoptic segmentation). Until recently, most of these tasks have been separately tackled with specialized model designs, preventing the synergy of tasks across different granularities from being exploited.
In light of the versatility of transformers and inspired by large-scale vision-language pre-training, the computer vision community is now witnessing a growing interest in building general-purpose vision systems, also called vision foundation models, that can learn from and be applied to various downstream tasks, ranging from image-level , region-level, to pixel-level vision tasks.
In this tutorial, we will cover the most recent approaches and principles at the frontier of learning and applying vision foundation models, including (1) learning vision foundation models from natural language supervision, with applications to open-vocabulary image classification and retrieval, object detection, segmentation, and multimodal understanding; (2) learning vision foundation models via masked image modeling, with its extensions to multimodal pre-training; and (3) vision foundation model architecture design with transformer and beyond.
Bio s:Joint 3rd Ego4D and 11th EPIC Workshop on Egocentric Vision Mon 19 Jun 08:30 a.m.
This joint full-day workshop is the longstanding event that brings together the strongly growing egocentric computer vision community, offering the 3rd Ego4D edition and the 11th Egocentric Perception, Interaction and Perception (EPIC) edition. This year, 17 Ego4D benchmark and 9 EPIC benchmark winners and findings will be presented throughout the day, ranging from social interactions, episodic memory, hand-object interactions, long-term tracking, video object segmentations and audio-based interaction recognition. In addition to the recurring Ego4D and EPIC challenges, new challenges are associated with recently released benchmarks EgoTracks, PACO, EPIC-KTICHENS VISOR and EPIC-Sounds.
Additionally, the day will include accepted abstracts, invited CVPR papers and 5 keynotes by Andrea Vedaldi (Oxford and Meta), Hyun Soo Park (UMinnesota), David Fouhey (UMich) and Suraj Nair (Stanford). Check the program for details.
O-DRUM: Workshop on Open-Domain Reasoning Under Multi-Modal Settings Mon 19 Jun 08:30 a.m.
EMBEDDED VISION WORKSHOP 2023 Mon 19 Jun 08:30 a.m.
Embedded vision is an active field of research, bringing together efficient learning models with fast computer vision and pattern recognition algorithms. We tackle many areas of robotics and intelligent systems and enjoy an impressive growth today.
2nd Workshop on Federated Learning for Computer Vision Mon 19 Jun 08:30 a.m.
Federated Learning (FL) has become an important privacy-preserving paradigm in various machine learning tasks. However, the potential of FL in computer vision applications, such as face recognition, person re-identification, and action recognition, is far from being fully exploited. Moreover, FL has rarely been demonstrated effectively in advanced computer vision tasks such as object detection and image segmentation, compared to the traditional centralized training paradigm. This workshop aims at bringing together researchers and practitioners with common interests in FL for computer vision and studying the different synergistic relations in this interdisciplinary area. The day-long event will facilitate interaction among students, scholars, and industry professionals from around the world to discuss future research challenges and opportunities.
Workshop on Vision-based InduStrial InspectiON (VISION) Mon 19 Jun 08:30 a.m.
The VISION workshop aims to provide a platform for the exchange of scholarly innovations and emerging practical challenges in Vision-based Industrial Inspection. Through a series of keynote talks, technical presentations, and challenge competition, this workshop is intended to (i) bring together researchers from the interdisciplinary research communities related to computer vision-based inspection; (ii) connect researchers and industry practitioners to synergize recent research progress and current needs in industrial practice.
Tutorial: Oriane Simeoni · Weidi Xie · Thomas Kipf · Patrick Pérez
Object localization for free: Going beyond self-supervised learning
Object localization in images is a key problem in a wide range of application domains that are embedded in critical settings such as self-driving vehicles or healthcare. However, most efficient solutions able to perform an object localization task follow the standard object detection and semantic segmentation frameworks, meaning that they require large amounts of annotated data for training. Different heuristics and tools can now assist and enhance human annotators, however manual annotation remains a largely heavy and expensive process. Moreover, perception models based on annotations enter a dependence circle of additional annotations for every new object class to detect or new external conditions to cover, e.g. in/outdoor, different times of the day, weathers. Such models struggle in dealing with our open complex world that is evolving continuously. Recent works have shown exciting prospects of avoiding annotations altogether by (1) leveraging self-supervised features, (2) building self-supervised object-centric objectives and (3) combining different modalities. In this context, we propose a half-day tutorial in which we will provide an in-depth coverage of different angles on performing/building-upon object localization with no human supervision.
Bio s:The 6th Workshop and Prize Challenge Bridging the Gap between Computational Photography and Visual Recognition (UG2+) in conjunction with IEEE CVPR 2023 Mon 19 Jun 08:30 a.m.
The rapid development of computer vision algorithms increasingly allows automatic visual recognition to be incorporated into a suite of emerging applications. Some of these applications have less-than-ideal circumstances such as low-visibility environments, causing image captures to have degradations. In other more extreme applications, such as imagers for flexible wearables, smart clothing sensors, ultra-thin headset cameras, implantable in vivo imaging, and others, standard camera systems cannot even be deployed, requiring new types of imaging devices. Computational photography addresses the concerns above by designing new computational techniques and incorporating them into the image capture and formation pipeline. This raises a set of new questions. For example, what is the current state-of-the-art for image restoration for images captured in non-ideal circumstances? How can inference be performed on novel kinds of computational photography devices?
Continuing the success of the 1st (CVPR'18), 2nd (CVPR'19), 3rd (CVPR'20), 4th (CVPR'21), and 5th (CVPR'22) UG2 Prize Challenge workshops, we provide its 6th version for CVPR 2023. It will inherit the successful benchmark dataset, platform and evaluation tools used by the previous UG2 workshops, but will also look at brand new aspects of the overall problem, significantly augmenting its existing scope.
Women in Computer Vision Workshop Mon 19 Jun 08:30 a.m.
The half-day Women in Computer Vision (WiCV) workshop is a gathering for researchers of all genders and career stages. All are welcome and encouraged to attend the workshop. Topics include - wide range of areas, including object recognition, image understanding, video analysis, 3D reconstruction, etc.
Virtual Poster Session from 12:15 - 1:00 pm at https://topia.io/wicvcvpr2023
Secure and Safe Autonomous Driving Workshop and Challenge (SSAD) Mon 19 Jun 08:45 a.m.
Workshop: The 6th Efficient Deep Learning for Computer Vision Mon 19 Jun 08:50 a.m.
Tutorial: Grigorios Chrysos · Fanghui Liu · Volkan Cevher
Deep Learning Theory for Computer Vision
What is the interplay of width/depth and how does the initialization affects the robustness to adversarial attacks? What is a principled heuristic for selecting good architectures in Neural Architecture Search (NAS)? What is the role of Fourier features in implicit neural representations (INRs)? In this tutorial, we aim to build a bridge between the empirical performance of neural networks and deep learning theory. In particular, we want to make the recent deep learning (DL) theory developments accessible to vision researchers, and motivate vision researchers to design new architectures and algorithms for practical tasks. In the first part of the tutorial, we will discuss popular notions in DL theory, such as lazy training and Neural Tangent Kernel (NTK), or bilevel optimization for adversarial attacks and NAS. Then, we will exhibit how such tools can be critical in understanding the inductive bias of networks.
Bio s:Workshop: Computer Vision in the Wild Mon 19 Jun 09:00 a.m.
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concepts.
Recent works show that learning from large-scale image-text data is a promising approach to building transferable visual models that can effortlessly adapt to a wide range of downstream computer vision (CV) and multimodal (MM) tasks. For example, CLIP, ALIGN and Florence for image classification, ViLD, RegionCLIP, GLIP and OWL-ViT for object detection, GroupViT, OpenSeg, MaskCLIP, X-Decoder, Segment Anything (SAM) and SEEM for segmentation, Multimodal GPT-4, LLaVA and MiniGPT4 for langauge-and-image instruction-following chat assistants. These vision models with language or interactive interface are naturally open-vocabulary recogntion models, showing superior zero-shot and few-shot adaption performance on various real-world scenarios.
We host this "Computer Vision in the Wild (CVinW)" workshop, aiming to gather academic and industry communities to work on CV and MM problems in real-world scenarios, focusing on the challenge of open-set/domain visual recognition at different granularities and efficient task-level transfer. To measure the progress of CVinW, we develop new benchmarks for image classification, object detection and segmentation to measure the task-level transfer ablity of various models/methods over diverse real-world datasets, in terms of both prediction accuracy and adaption efficiency. This workshop is a continuation of our ECCV 2022 CVinW Workshop. For those who are new to this topic, please check out the CVinW Reading List.
In this year, our workshop will host two new challenges:
- Segmentation in the Wild (SGinW): Open-set instance/semantic/panoptic segmentation on dozens of semgnetaion datasets in the realistic scenarios.
- Roboflow 100 for Object Detection in the Wild: An augmented version of our ODinW by increasing the datasets to hundreds to cover more diverse application domains.
Workshop: Safe Artificial Intelligence for All Domains Mon 19 Jun 09:00 a.m.
The workshop focuses on bringing together researchers, engineers, and practitioners from academia, industry, and government to exchange ideas, share their latest research, and discuss the latest trends and challenges in this field. The workshop also aims to foster collaboration between different stakeholders, including computer vision researchers, machine learning experts, robotics engineers and safety experts, to create a comprehensive framework for developing safe AI systems for all domains.
Overall, the SAIAD workshop aims to advance the state-of-the-art in safe AI, address the most pressing challenges, and provide a platform for networking and knowledge sharing among the experts in this field.
Workshop: Vision for All Seasons: Adverse Weather and Lighting Conditions Mon 19 Jun 09:00 a.m.
The 2nd Explainable AI for Computer Vision (XAI4CV) Workshop Mon 19 Jun 09:00 a.m.
4th Embodied AI Workshop Mon 19 Jun 09:00 a.m.
Visual Copy Detection Workshop Mon 19 Jun 09:00 a.m.
The Visual Copy Detection Workshop (VCDW) explores the task of identifying copied images and videos, robust to common transformations. This task is central to social problems facing online services where users share media, such as combating misinformation and exploitative imagery, as well as enforcing copyright. Recently, copy detection methods have been used to identify and promote original content, and to reduce memorization in both predictive and generative models.
The workshop will explore technical advances in copy detection as well as the applications that motivate this research. The workshop will feature the Video Similarity Challenge, a copy detection challenge in the video domain, including presentations by challenge participants.
The Fifth Workshop on Deep Learning for Geometric Computing Mon 19 Jun 09:00 a.m.
L3D-IVU: 2nd Workshop on Learning with Limited Labelled Data for Image and Video Understanding Mon 19 Jun 09:00 a.m.
Workshop: Visual Pre-training for Robotics Mon 19 Jun 09:00 a.m.
Workshop: AI for Content Creation Mon 19 Jun 09:00 a.m.
Workshop: Sight and Sound Mon 19 Jun 09:00 a.m.
Tutorial: Yuchao Dai · Yinqiang Zheng · Bin Fan · Zhihang Zhong · Zhixiang Wang
Rolling Shutter Camera: Modeling, Optimization, Learning, and Hardware
This half-day tutorial will cover the latest advances in this area from three aspects, i.e., motion modeling and optimization-based solutions, deep learning-based solutions, and joint hardware and deep learning-based solutions. Specifically, we will first systematically present geometric motion models (like discrete, continuous, and special motions) and optimization-based approaches. Then, we will introduce deep learning-based RS image processing methods, such as RS image correction and RS temporal super-resolution, with new results and benchmarks that have recently appeared. Finally, we will elaborate on the combination of hardware features of RS cameras (e.g., dual RS cameras and global reset feature) and deep learning to boost the correction of RS geometric distortions.
Bio s:9th IEEE International Workshop on Computer Vision in Sports (CVsports) Mon 19 Jun 09:00 a.m.
Tutorial: Jinwei Ye · Seung-Hwan Baek · Achuta Kadambi · Huaijin Chen
Polarization-based Computer Vision
Polarization is a fundamental property of light and describes the direction in which the electric field of light oscillates. Polarization, as an intrinsic property of light, provides an extra dimension of information for probing the physical world. Although polarization is often overlooked, it allows for efficient geometry and material analysis beyond the conventional color images. With the snapshot quad-Bayer polarization camera being commercialized, there have been growing interests in using polarization cues to solve a wide range of computer vision problems. Recent advances have demonstrated advantages of using polarization imaging for geometry and material understanding.
In this tutorial, we will cover comprehensive topics in polarization imaging, from the fundamental physical principles to its applications in various computer vision problems. We will specifically focus on recent advances on using polarization imaging for solving the problems of reflectance modeling, 3D reconstruction, and transparent object segmentation. Finally, we will showcase applications of polarization imaging in industry settings.
Bio s:Tutorial: Sijia Liu · Xiaoming Liu · Xue Lin
Reverse Engineering of Deception (RED): Foundations and Applications
This tutorial will deliver a well-rounded understanding of the emerging field of reverse engineering of deception (RED) techniques, a cutting-edge topic in adversarial machine learning (ML) for reliable computer vision (CV). Past studies have extensively explored the generation, detection, and defense of machine-centric deception (e.g., adversarial attacks that deceive ML models) and human-centric deception (e.g., GAN-created images that mislead human decision-making) in CV. However, RED introduces a new adversarial learning paradigm that automatically uncovers and catalogs attack "fingerprints" found in both machine and human-centric attacks. The RED problem addressed in the tutorial is: Can we reverse-engineer the adversary's knowledge and attack toolchains beyond conventional adversarial detection/defense techniques? To this end, this tutorial will cover the following key aspects: (1) Review RED's definition and formulation, addressing basics and preliminaries. (2) Discuss the challenges and significance of RED, highlighting its connections and differences with conventional adversarial detection/defense techniques in ML. (3) Explore RED for machine-centric adversaries, reviewing recent RED developments on top of a variety of adversarial attacks. (4) Examine RED for human-centric adversaries, reviewing RED methods for the detection and model parsing of GAN-generated fake images. (5) Demonstrate and showcase RED applications in CV.
Bio s:Tutorial: Vishnu Naresh Boddeti · Zhichao Lu · Qingfu Zhang · and Kalyanmoy Deb
Multi-Objective Optimization for Deep Learning
Real-world applications of deep learning often have to contend with objectives beyond predictive performance, i.e., more than one equally important and competing objective or criterion. Examples include cost functions pertaining to invariance (e.g., to photometric or geometric variations), semantic independence (e.g., to age or race for face recognition systems), privacy (e.g., mitigating leakage of sensitive information), algorithmic fairness (e.g., demographic parity), generalization across multiple domains, computational complexity (FLOPs, compactness), to name a few. In such applications, achieving a single solution that simultaneously optimizes all objectives is no longer feasible; instead, finding a set of solutions that are representative in describing the trade-off among objectives becomes the goal. Multiple approaches have been developed for such problems, including simple scalarization and population-based methods. This tutorial aims to provide a comprehensive introduction to fundamentals, recent advances, and applications of multi-objective optimization (MOO), followed by hands-on coding examples. Some emerging applications of MOO include (1) hardware-aware neural architecture search; (2) multi-task learning as multi-objective optimization; (3) representation learning for privacy and fairness. We will also summarize potential research directions intersecting MOO and ML/CV research.
Bio s:Tutorial: Pascal Mettes · Max van Spengler · Yunhui Guo · Stella X. Yu
Hyperbolic Deep Learning in Computer Vision
Learning in computer vision is all about deep networks and such networks operate on Euclidean manifolds by default. While Euclidean space is an intuitive and practical choice, foundational work on non-visual data has shown that when information is hierarchical in nature, hyperbolic space is superior, as it allows for an embedding without distortion. A core reason is because Euclidean distances scale linearly as a function of their norm, while hyperbolic distances grow exponentially, just like hierarchies grow exponentially with depth. This initial finding has resulted in rapid developments in hyperbolic geometry for deep learning.
Hyperbolic deep learning is booming in computer vision, with new theoretical and empirical advances with every new conference. But what is hyperbolic geometry exactly? What is its potential for computer vision? And how can we perform hyperbolic deep learning in practice? This tutorial will cover all such questions. We will dive into the geometry itself, how to design networks in hyperbolic space, and we show how current literature profits from learning in this space. The aim is to provide technical depth while addressing a broad audience of computer vision researchers and enthusiasts.
Bio s:Tutorial: Yanwei Fu · Da Li · Yu-Xiong Wang · Timothy Hospedales
Few-shot Learning from Meta-Learning, Statistical Understanding to Applications
There is a growing trend of research in few-shot learning (FSL), which involves adapting learned knowledge to learn new concepts with limited few-shot training examples. This tutorial comprises several talks, including an overview of few-shot learning by Dr. Da Li and a discussion of seminal and state-of-the-art meta-learning methods for FSL by Prof. Timothy Hospedales. The tutorial will cover both gradient-based and amortised meta-learners, as well as some theory for meta-learning, and Dr. Yanwei Fu will introduce recent FSL techniques that use statistical methods, such as exploiting the support of unlabeled instances for few-shot visual recognition and causal inference for few-shot learning. Dr. Yu-Xiong Wang will also discuss various applications of FSL in fields beyond computer vision, such as natural language processing, reinforcement learning, and robotics.
Bio s:Tutorial: Manling Li · Xudong Lin · Jie Lei
Knowledge-Driven Vision-Language Encoding
Does knowledge still have value in current era of large-scale pretraining? In this tutorial, we will comprehensively review existing paradigms for multimedia knowledge discovery and encoding, and focus on their contributions to vision-language pretraining. We categorize the knowledge into internal self-knowledge and external knowledge. Internal knowledge are extracted from text and vision modalities, such as structured entities, relations, events, and event procedures. We will focus on the structural aspects of the knowledge and address two key challenges regarding the acquisition of knowledge and encoding of structure across multiple modalities. External knowledge can be obtained from knowledge bases or language models, and we will exemplify their use to assist in commonsense understanding of vision modalities, with a focus on the temporal and cognitive aspects. The objective of this tutorial is to introduce participants to recent trends and emerging challenges in knowledge-driven vision-language research, as well as learning resources and tools for participants to obtain ready-to-use models, prompting thorough discussions regarding the impact of structured knowledge on text and vision learning.
Bio s:Tutorial: Kaiyang Zhou · Ziwei Liu · Phillip Isola · Hyojin Bahng · Ludwig Schmidt · Sarah Pratt · Denny Zhou
Originating from natural language processing, the new paradigm of prompting has recently swept through the computer vision community, bringing disruptive changes to various computer vision applications, such as image recognition and image generation. In comparison to the traditional fixed-once-learned architecture, like a linear classifier trained to recognize a specific set of categories, prompting offers greater flexibility and more opportunities for novel applications. It allows the model to perform new tasks, such as recognizing new categories, by tuning textual instructions or modifying a small number of parameters in the model's input space while keeping the majority of the pre-trained parameters untouched. This paradigm significantly pushes conversational human-AI interaction to unprecedented levels. Within a short period of time, the effectiveness of prompting has been demonstrated in a wide range of problem domains, including image classification, object detection, image generation and editing, video analytics, and robot control. In this tutorial, our aim is to provide a comprehensive background on prompting by building connections between research in computer vision and natural language processing. We will also review the latest advances in using prompting to tackle computer vision problems.
Bio s:Tutorial: Giovanni Pintore · Marco Agus · Enrico Gobbetti
Automatic 3D modeling of indoor structures from panoramic imagery
Creating high-level structured 3D models of real-world indoor scenes from captured data and exploiting them are fundamental tasks with important applications in many fields. In this context, 360 capture and processing is very appealing, since panoramic imaging provides the quickest and most complete per-image coverage and is supported by a wide variety of professional and consumer capture devices. Research on inferring 3D indoor models from 360 images has been thriving in recent years, and has led to a variety of very effective solutions. Given the complexity and variability of interior environments, and the need to cope with noisy and incomplete captured data, many open research problems still remain. In this tutorial, we provide an up-to-date integrative view of the field. After introducing a characterization of input sources, we define the structure of output models, the priors exploited to bridge the gap between imperfect input and desired output, and the main characteristics of geometry reasoning and data-driven approaches. We then identify and discuss the main subproblems in structured reconstruction, and review and analyze state-of-the-art solutions for floor plan segmentation, bounding surfaces reconstruction, object detection and reconstruction, integrated model computation, and visual representation generation. We finally point out relevant research issues and analyze research trends.
Bio s:Tutorial: Yinqiang Zheng · Yunhao Zou · Haiyang Jiang · Ying Fu
Optics for Better AI: Capturing and Synthesizing Realistic Data for Low-light Enhancement
This half-day tutorial will cover the latest advances in the broad theme of Optics for Better AI, with a specific focus on how to capture and synthesize realistic data for training low-light enhancement deep models. In this tutorial, we will first present the overall pipeline and effects of using realistic data, including (i) Low-light Image Enhancement using Synthesized Data; (ii) Low-light Video Enhancement using Captured Data. Then, we show detailed instructions on noise calibration and construction of optical imaging systems, including (iii) How to Calibrate the Noise Model of a Specific Camera; (iv) How to Construct a Co-axial Imaging System.
Bio s:Tutorial: Raquel Urtasun · Sergio Casas · Abbas Sadat · Sivabalan Manivasagam · Paul Spriesterbach · Ioan Barsan Barsan
All you need to know about self-driving
A full day tutorial covering all aspects of autonomous driving. This tutorial will provide the necessary background for understanding the different tasks and associated challenges, the different sensors and data sources one can use and how to exploit them, as well as how to formulate the relevant algorithmic problems such that efficient learning and inference is possible. We will first introduce the self-driving problem setting and a broad range of existing solutions, both top-down from a high-level perspective, as well as bottom-up from technological and algorithmic points of view. We will then extrapolate from the state of the art and discuss where the challenges and open problems are, and where we need to head towards to provide a scalable, safe and affordable self-driving solution for the future.
Bio s:
Workshop on Foundation Models: 1st Foundation Model Challenge Mon 19 Jun 09:00 a.m.
Workshop: Scholars and Big Models — How Can Academics Adapt? Mon 19 Jun 12:45 p.m.
Workshop: 2nd Challenge on Machine Visual Common Sense: Perception, Prediction, Planning Mon 19 Jun 01:00 p.m.
The 4th Workshop on Omnidirectional Computer Vision Mon 19 Jun 01:00 p.m.
Our objective is to provide a venue for novel research in omnidirectional computer vision with an eye toward actualizing these ideas for commercial or societal benefit. As omnidirectional cameras become more widespread, we want to bridge the gap between the research and application of omnidirectional vision technologies. Omnidirectional cameras are already widespread in a number of application areas such as automotive, surveillance, photography, simulation and other use-cases that benefit from large field of view. More recently, they have garnered interest for use in virtual and augmented reality. We want to encourage the development of new models that natively operate on omnidirectional imagery as well as close the performance gap between perspective-image and omnidirectional algorithms.
Workshop: RetailVision - Revolutionizing the World of Retail Mon 19 Jun 01:00 p.m.
Workshop: Photogrammetric Computer Vision Mon 19 Jun 01:00 p.m.
PCV2023 is a half-day workshop at CVPR2023 which provides a forum for original research in computer vision and photogrammetry. PCV2023 invites submissions of high-quality research papers concerning the generation, processing, and analysis of images, 3D point clouds and surface models, with the goal of enhancing accuracy and completeness. Topics of interest include, but are not limited to:
- Feature extraction, matching, and sensor orientation and sensor fusion
- Structure from motion and SLAM
- Stereo (multi-view) and surface reconstruction
- 3D point cloud processing, segmentation, and classification
- Multi-temporal analysis, dynamic scene understanding
- 3D scene analysis and semantic segmentation
2nd Workshop on Multimodal Learning for Earth and Environment (MultiEarth) Mon 19 Jun 01:00 p.m.
Tutorial: Ioannis Gkioulekas · Adithya Pediredla
Physics-based rendering and its applications in computational photography and imaging
Bio s:DynaVis: The 4th International Workshop on Dynamic Scene Reconstruction Mon 19 Jun 01:30 p.m.
Reconstruction of general dynamic scenes is motivated by potential applications in film and broadcast production together with the ultimate goal of automatic understanding of real-world scenes from distributed camera networks. With recent advances in hardware and the advent of virtual and augmented reality, dynamic scene reconstruction is being applied to more complex scenes with applications in Entertainment, Games, Film, Creative Industries and AR/VR/MR. We welcome contributions to this workshop in the form of oral presentations and posters.
Tutorial: Maying Shen · Hongxu Yin · Jason Clemons · Pavlo Molchanov · Jose M. Alvarez · Jan Kautz
Full-Stack, GPU-based Acceleration of Deep Learning
This tutorial focuses on describing techniques to allow deep learning practitioners to accelerate the training and inference of large deep networks while also reducing memory requirements across a spectrum of off-the-shelf hardware for important applications such as autonomous driving and large language models. Topics include, but are not limited to: 1) Deep learning specialized hardware overview. We review the architecture of the most used deep learning acceleration hardware, including the main computational processors and memory modules. 2) How deep learning is performed on this hardware. We cover aspects of algorithmic intensity and an overview of theoretical aspects of computing. Attendees will learn how to estimate processing time and latency by looking only at hardware specs and the network architecture. 3) Best practices for acceleration. We provide an overview of best practices for designing efficient neural networks including channel number selection, compute heavy operations, or reduction operations among others. 4) Existing tools for model acceleration. In this part we will focus on existing tools to accelerate a trained neural network on GPU devices. We will particularly discuss operation folding, TensorRT, ONNX graph optimization, sparsity. 5) Research overview of recent techniques. In the last part, we will focus on recent advanced techniques for post training model optimization including pruning, quantization, model distillation or NAS among others.
Bio s:The 4th Face Anti-spoofing Workshop and Challenge Mon 19 Jun 01:30 p.m.
Tutorial: Yusuke Matsui · Martin Aumuller · Han Xiao
Neural search, a technique for efficiently searching for similar items in deep embedding space, is the most fundamental technique for handling large multimodal collections. With the advent of powerful technologies such as foundation models and prompt engineering, efficient neural search is becoming increasingly important. For example, multimodal encoders such as CLIP allow us to convert various problems into simple embedding-and-search. Another example is the way to feed information into LLMs; currently, vector search engines are a promising direction. Despite the above attention, it is not obvious how to design a search algorithm for given data. In this tutorial, we will focus on "million-scale search", "billion-scale search", and "query language" to show how to tackle real-world search problems.
Bio s:Tutorial: Edward Miller · Pierre Moulon · Prince Gupta · Rawal Khirodkar · Richard Newcombe · Sach Lakhavani · Zhaoyang Lv
Hands-on Egocentric Research with Project Aria from Meta
Project Aria is a research device from Meta, which is worn like a regular pair of glasses, and enables researchers to study the future of always-on egocentric perception. In this tutorial, we will introduce two exciting new datasets from Project Aria: Aria Digital Twin, a real-world dataset with hyper-accurate digital counterpart; and Aria Synthetic Environments, a procedurally-generated synthetic Aria dataset for large-scale ML research. Each dataset will be presented with corresponding challenges, which we believe will be powerful catalysts for research. In addition to introducing new datasets and research challenges, we will also provide a hands-on demonstration of newly open-sourced tools for working with Project Aria, and demonstrate how the Project Aria ecosystem can be used to accelerate open research into egocentric perception tasks such as visual and non-visual localization and mapping, static and dynamic object detection and spatialization, human pose and eye-gaze estimation, and building geometry estimation.
Bio s:Workshop: 5th ScanNet Indoor Scene Understanding Challenge Mon 19 Jun 01:30 p.m.
Tutorial: Nathan Kundtz · Matt Robinson · Dan Hedges
Exploring Synthetic data as an Enterprise Capability for Training and Validating CV Systems
With the rise of edge computing, increase in remote sensing information, and ubiquitous adoption of computer vision systems throughout retail and manufacturing markets, organizations are increasingly relying on the accuracy and reliably of training Artificial Intelligence and Machine Learning systems to analyze and extract information from data captured using physical sensors and sensor platforms. Real data sets often fail to capture rare events or assets, are inaccurately labeled, and the collection of real sensor data can have cost, privacy, security, and safety issues.
Synthetic data offers the opportunity to design and label datasets for specific algorithmic training needs. Synthetic imagery designed to emulate ground-based video systems or remotely sensed satellite imagery, for example, can be generated to show real world locations populated with objects that are hard to find or that don’t yet exist. Accurately labeled, simulated datasets can be created to fit a wide range of potential real-world scenarios in which AI/ML systems will be deployed, thereby enabling teams to train and test these systems before being deployed in production environments.
This tutorial will include an introduction to creating, using, and iterating on synthetic data using the open Rendered.ai synthetic data platform. We will also feature a demonstration using NVIDIA Omniverse Replicator in the AWS cloud. The tutorial will define physics-based synthetic data, discuss differences with Generative AI, and introduce concepts for designing synthetic data.
Bio s: