Workshops
The 2nd International Workshop on Transformers for Vision
The Second Workshop on Structural and Compositional Learning on 3D Data
Synthetic Data for Autonomous Systems (SDAS)
Second Workshop of Mobile Intelligent Photography and Imaging
8th New Trends in Image Restoration and Enhancement Workshop and Challenges
Third Workshop on Ethical Considerations in Creative Applications of Computer Vision - EC3V
Computer vision technologies like generative image models are rapidly being integrated into creative domains to, for example, aid in artistic content retrieval and curation, generate synthetic media, or enable new forms of artistic methods and creations. However, creative AI technologies bring with them a host of ethical concerns, ranging from representational harms associated culturally sensitive matter to impact on artistic practices and copyright and ownership concerns. In particular, it is unclear what kinds of performance failures and biases these models bring when deployed in cross-cultural and non-western settings.
We encourage retrospective discussions, position papers examining the cross-cultural and social impacts of creative applications of computer vision, ethical considerations in this domain including but not limited to artwork attributions, inequity in cultural performance, cultural appropriation, environmental impacts of generative arts, biases embedded in generative arts, dynamics of art marketplaces/platforms, and policy perspectives on creative AI.
Our aim is to create a platform for interdisciplinary discussions on these issues among computer vision researchers, socio-technical researchers, policy makers, social scientists, artists, and other cultural stakeholders. This year our Generative Art Demo will invite artists to use computer vision technologies to create art pieces that center questions and topics of cultural significance and create space for collective reflections on the role of AI art especially within non-western communities.
New Frontiers for Zero-Shot Image Captioning Evaluation
The purpose of this workshop is to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness (i.e. mitigating societal biases). Both of these issues must be addressed fully before image captioning technology can be reliably deployed in a large-scale setting.
The workshop will focus on testing the true limits of image captioning models under the zero-shot image captioning setting. It aims to challenge the models by providing a large-scale evaluation dataset that includes a larger variety of visual concepts from many domains (including new concepts such as COVID-19) as well as various image types (photographs, illustrations, graphics). To accomplish this task, the models need to broadly understand language-vision relations and also learn how to combine language components for a new concept of image. Before the workshop, a challenge on zero-shot image captioning will be processed, and the results will be shared in the workshop. By providing results only on the limited evaluation dataset, the submitted models will be challenged to understand new concepts and unseen environments.
Throughout the workshop and challenge, we will cover a broad range of topics on understanding language and image together, so that the machines can communicate with humans on what they see in natural language. Therefore, we plan to invite researchers to provide talks on various topics under the range of combination of language and vision.
CVPR 2023 - 10th Workshop on Medical Computer Vision (MCV)
The CVPR MCV workshop provides a unique forum for researchers and developers in academia, industry and healthcare to present, discuss and learn about cutting-edge advances in machine learning and computer vision for medical image analysis and computer assisted interventions. The workshop offers a venue for potential new collaborative efforts, encouraging more dataset and information exchanges for important clinical applications.
The ultimate goal of the MCV workshop is to bring together stakeholders interested in leveraging medical imaging data, machine learning and computer vision algorithms to build the next generation of tools and products to advance image-based healthcare. It is time to deliver!
The program features invited talks from leading researchers from academia and industry and clinicians. There will be no paper submissions at this year's workshop.
The Fourth Workshop on Fair, Data-efficient, and Trusted Computer Vision
Workshop on Fair, Data Efficient and Trusted Computer Vision will address four critical issues in enhancing user trust in AI and computer vision systems namely: (i) Fairness, (ii) Data Efficient learning and critical aspects of trust including (ii) explainability, (iii) mitigating adversarial attacks robustly and (iv) improve privacy and security in model building with right level of credit assignment to the data sources along with transparency in lineage.
The 3rd Workshop of Adversarial Machine Learning on Computer Vision: Art of Robustness
OmniLabel: Infinite label spaces for semantic understanding via natural language
The goal of this workshop is to foster research on the next generation of visual perception systems that reason over label spaces that go beyond a list of simple category names. Modern applications of computer vision require systems that understand a full spectrum of labels, from plain category names (“person” or “cat” ), over modifying descriptions using attributes, actions, functions or relations (“women with yellow handbag” , “parked cars” , or “edible item” ), to specific referring descriptions (“the man in the white hat walking next to the fire hydrant” ). Natural language is a promising direction not only to enable such complex label spaces, but also to train such models from multiple datasets with different, and potentially conflicting, label spaces. Besides an excellent list of invited speakers from both academia and industry, the workshop will present the results of the OmniLabel challenge, which we held with our newly collected benchmark dataset that subsumes generic object detection, open-vocabulary detection, and referring expression comprehension into one unified and challenging task.
The Second Workshop on 3D Vision and Robotics
1st Workshop on Multimodal Content Moderation
Content moderation (CM) is a rapidly growing need in today’s industry, with a high societal impact, where automated CM systems can discover discrimination, violent acts, hate/toxicity, and much more, on a variety of signals (visual, text/OCR, speech, audio, language, generated content, etc.). Leaving or providing unsafe content on social platforms and devices can cause a variety of harmful consequences, including brand damage to institutions and public figures, erosion of trust in science and government, marginalization of minorities, geo-political conflicts, suicidal thoughts and more. Besides user-generated content, content generated by powerful AI models such as DALL-E and GPT present additional challenges to CM systems.
With the prevalence of multimedia social networking and online gaming, the problem of sensitive content detection and moderation is by nature multimodal. The Hateful memes dataset [1] highlights the multimodal nature of content moderation, for example, an image of a skunk and a sentence “you smell good” are benign/neutral separately, but can be hateful when interpreted together. Another aspect is the complementary nature of multimodal analysis where there may be ambiguity in interpreting individual modalities separately. Moreover, content moderation is contextual and culturally multifaceted, for example, different cultures have different conventions about gestures. This requires CM approach to be not only multimodal, but also context aware and culturally sensitive.
Despite the urgency and complexity of the content moderation problem, it has not been an area of focus in the research community. By having a workshop at CVPR, we hope to bring attention to this important research and application area, build and grow the community of interested researchers, and generate new discussion and momentum for positive social impact. Through invited talks, panels, and paper submissions, this workshop will build a forum to discuss ongoing efforts in industry and academia, share best practices, and engage the community in working towards socially responsible solutions for these problems.
With organizers across industry and academia, speakers who are experts across relevant disciplines investigating technical and policy challenges, we are confident that the Workshop on Multimodal Content Moderation (MMCM) will complement the main conference by strengthening and nurturing the community for interdisciplinary cross-organization knowledge sharing to push the envelope of what is possible, and improve the quality and safety of multimodal sensitive content detection and moderation solutions that will benefit the society at large.
DL-UIA: Deep Learning in Ultrasound Image Analysis
GAZE 2023: The 5th International Workshop on Gaze Estimation and Prediction in the Wild
The 5th International Workshop on Gaze Estimation and Prediction in the Wild (GAZE 2023) at CVPR 2023 aims to encourage and highlight novel strategies for eye gaze estimation and prediction with a focus on robustness and accuracy in extended parameter spaces, both spatially and temporally. This is expected to be achieved by applying novel neural network architectures, incorporating anatomical insights and constraints, introducing new and challenging datasets, and exploiting multi-modal training. Specifically, the workshop topics include (but are not limited to):
- Reformulating eye detection, gaze estimation, and gaze prediction pipelines with deep networks.
- Applying geometric and anatomical constraints into the training of (sparse or dense) deep networks.
- Leveraging additional cues such as contexts from face region and head pose information.
- Developing adversarial methods to deal with conditions where current methods fail (illumination, appearance, etc.).
- Exploring attention mechanisms to predict the point of regard.
- Designing new accurate measures to account for rapid eye gaze movement.
- Novel methods for temporal gaze estimation and prediction including Bayesian methods.
- Integrating differentiable components into 3D gaze estimation frameworks.
- Robust estimation from different data modalities such as RGB, depth, head pose, and eye region landmarks.
- Generic gaze estimation method for handling extreme head poses and gaze directions.
- Temporal information usage for eye tracking to provide consistent gaze estimation on the screen.
- Personalization of gaze estimators with few-shot learning.
- Semi-/weak-/un-/self- supervised leraning methods, domain adaptation methods, and other novel methods towards improved representation learning from eye/face region images or gaze target region images.
Generative Models for Computer Vision
LatinX in Computer Vision Research Workshop
VAND: Visual Anomaly and Novelty Detection
Catch UAVs that Want to Watch You: Detection and Tracking of Unmanned Aerial Vehicle (UAV) in the Wild and the 3rd Anti-UAV Workshop & Challenge
4th International Workshop on Large Scale Holistic Video Understanding
XRNeRF: Advances in NeRF for the Metaverse
19th CVPR Workshop on Perception Beyond the Visible Spectrum (PBVS 2023)
4th Workshop on Continual Learning in Computer Vision (CLVision)
Incorporating new knowledge in existing models to adapt to novel problems is a fundamental challenge of computer vision. Humans and animals continuously assimilate new experiences to survive in new environments and to improve in situations already encountered in the past. Moreover, while current computer vision models have to be trained with independent and identically distributed random variables, biological systems incrementally learn from non-stationary data distributions. This ability to learn from continuous streams of data, without interfering with previously acquired knowledge and exhibiting positive transfer is called Continual Learning. The CVPR Workshop on “Continual Learning in Computer Vision” (CLVision) aims to gather researchers and engineers from academia and industry to discuss the latest advances in Continual Learning. In this workshop, there are regular paper presentations, invited speakers, and a technical benchmark challenges to present the current state of the art, as well as the limitations and future directions for Continual Learning, arguably one of the most challenging milestones of AI.
2nd Monocular Depth Estimation Challenge
Monocular depth estimation (MDE) is an important low-level vision task, with application in fields such as augmented reality, robotics and autonomous vehicles. Recently, there has been an increased interest in self-supervised systems capable of predicting the 3D scene structure without requiring ground-truth LiDAR training data. Automotive data has accelerated the development of these systems, thanks to the vast quantities of data, the ubiquity of stereo camera rigs and the mostly-static world. However, the evaluation process has also remained focused on only the automotive domain and has been largely unchanged since its inception, relying on simple metrics and sparse LiDAR data.
This workshop seeks to answer the following questions:
1. How well do networks generalize beyond their training distribution relative to humans?
2. What metrics provide the most insight into the model’s performance? What is the relative weight of simple cues, e.g. height in the image, in networks and humans?
3. How do the predictions made by the models differ from how humans perceive depth? Are the failure modes the same?
The workshop will therefore consist of two parts: invited keynote talks discussing current developments in MDE and a challenge organized around a novel benchmarking procedure using the SYNS dataset.
7th Workshop on Media Forensics
FGVC10: 10th Workshop on Fine-grained Visual Categorization
Fine-grained categorization, the precise differentiation between similar plant or animal species, disease of the retina, architectural styles, etc., is an extremely challenging problem, pushing the limits of both human and machine ability. In these domains expert knowledge is typically required, and the question that must be addressed is how can we develop systems that can efficiently discriminate between large numbers of highly similar visual concepts. The 10th Workshop on Fine-Grained Visual Categorization (FGVC10) explores topics related to supervised learning, self- supervised learning, semi-supervised learning, matching, localization, domain adaptation, transfer learning, few-shot learning, machine teaching, multimodal learning (e.g., audio and video), 3D- vision, crowd-sourcing, image captioning and generation, out-of- distribution detection, open-set recognition, human-in-the-loop learning, etc., all through the lens of fine-grained understanding. Topics relevant for FGVC10 are neither restricted to vision nor categorization. FGVC10 consists of invited talks from world- renowned computer vision experts and domain experts (e.g., art), poster sessions, challenges, and peer-reviewed extended abstracts. To mark FGVC’s 10th anniversary, we have confirmed five panellists for a discussion of the history and future of FGVC. We aim to stimulate debate and to expose the wider computer vision community to new challenging problems which have the potential for large societal impact but do not traditionally receive a significant amount of exposure at other CVPR workshops.
Topological, Algebraic, and Geometric Pattern Recognition with Applications Workshop Proposal
3rd International Workshop and Challenge on Long-form Video Understanding and Generation
12th IEEE International Workshop on Computational Cameras and Displays (CCD)
Fourth Workshop on Neural Architecture Search, Third lightweight NAS challenge
EarthVision: Large Scale Computer Vision for Remote Sensing Imagery
Computer Vision for Mixed Reality
Workshop on End-to-end Autonomous Driving
End-to-end autonomous driving, as a relatively new paradigm (compared to the modular design) yet with great potential, has already attracted attention from both academia and industry. This workshop serves a brand-new perspective to discuss broad areas of end-to-end framework design for autonomous driving on a system-level consideration. Central to the program is a series of invited talks and four new challenges in the self-driving domain. Each challenge combines new perspectives of multiple components in perception and planning compared to conventional pipelines.
Visual Perception via Learning in an Open World
3rd Workshop and Challenge on Computer Vision in the Built Environment for the Design, Construction, and Operation of Buildings
2nd Workshop on Tracking and Its Many Guises: Tracking Any Object in Open-World
The 4th CVPR Workshop on 3D Scene Understanding for Vision, Graphics, and Robotics
New Frontiers in Visual Language Reasoning: Compositionality, Prompts and Causality
Recent years have seen the stunning powers of Visual Language Pre-training (VLP) models. Although VLPs have revolutionalized some fundamental principles of visual language reasoning (VLR), the other remaining problems prevent them from “thinking” like a human being: how to reason the world from breaking into parts (compositionality), how to achieve the generalization towards novel concepts provided a glimpse of demonstrations in context (prompts), and how to debias visual language reasoning by imagining what would have happened in the counterfactual scenarios (causality).
The workshop provides the opportunity to gather researchers from different fields to review the technology trends of the three lines, to better endow VLPs with these reasoning abilities. Our workshop also consists of two multi-modal reasoning challenges under the backgrounds of cross-modal math-word calculation and proving problems. The challenges are practical and highly involved with our issues, therefore, shedding more insights into the new frontiers of visual language reasoning.
Workshop on Autonomous Driving (WAD)
The CVPR 2023 Workshop on Autonomous Driving (WAD) aims to gather researchers and engineers from academia and industry to discuss the latest advances in perception for autonomous driving. In this full-day workshop, we will host speakers as well as technical benchmark challenges to present the current state of the art, limitations and future directions in the field - arguably one of the most promising applications of computer vision and artificial intelligence. The previous chapters of this workshop attracted hundreds of researchers to attend. This year, multiple industry sponsors are also joining our organizing efforts to push it to a new level.
6th Multi-modal Learning and Applications Workshop (MULA)
The exploitation of the power of big data in the last few years led to a big step forward in many applications of Computer Vision. However, most of the tasks tackled so far are involving visual modality only, mainly due to the unbalanced number of labelled samples available among modalities (e.g., there are many huge labelled datasets for images while not as many for audio or IMU based classification), resulting in a huge gap in performance when algorithms are trained separately.
Recently, a few works have started to exploit the synchronization of multimodal streams (e.g., audio/video, RGB/depth, RGB/Lidar, visual/text, text/audio) to transfer semantic information from one modality to another reaching surprising results. Interesting applications are also proposed in a self-supervised fashion, where multiple modalities are learning correspondences without need of manual labelling, resulting in a more powerful set of features compared to those learned processing the two modalities separately. Other works have also shown that particular training paradigms allow neural networks to perform well when one of the modalities is missing due to sensor failure or unfavorable environmental conditions. These topics are gaining lots of interest in computer vision community in the recent years.
The information fusion from multiple sensors is a topic of major interest also in industry, the exponential growth of companies working on automotive, drone vision, surveillance or robotics are just a few examples. Many companies are trying to automate processes, by using a large variety of control signals from different sources. The aim of this workshop is to generate momentum around this topic of growing interest, and to encourage interdisciplinary interaction and collaboration between computer vision, multimedia, remote sensing, and robotics communities, that will serve as a forum for research groups from academia and industry.
We expect contributions involving, but not limited to, image, video, audio, depth, IR, IMU, laser, text, drawings, synthetic, etc. Position papers with feasibility studies and cross-modality issues with highly applicative flair are also encouraged. Multimodal data analysis is a very important bridge among vision, multimedia, remote sensing, and robotics, therefore we expect a positive response from these communities.
CV4Animals: Computer Vision for Animal Behavior Tracking and Modeling (Workshop)
Many biological organisms are evolved to exhibit diverse quintessential behaviors via physical and social interactions with surroundings, and understanding these behaviors is a fundamental goal of multiple disciplines including neuroscience, biology, medicine, behavior science, and sociology. For example, ethogramming characterizes the behavioral states and their transitions, which further provides a scientific basis to understand innate human behaviors, e.g., decision-making, attention, and group behaviors. These analyses require objective, repeatable, and scalable measurements of animal behaviors that are not possible with existing methodologies that leverage manual encoding from animal experts and specialists. Recently, computer vision has been making a groundbreaking impact by providing a new tool that enables computational measurements of the behaviors.
The workshop offers invited talks, orals, and poster sessions by the leading scientists in the field, coming computer vision, neuro science, and biology. Our webpage list the full schedule, accepted papers, and posters.
End-to-End Autonomous Driving: Perception, Prediction, Planning and Simulation
1st Workshop on Compositional 3D Vision & 3DCoMPaT Challenge
6th International Workshop on Visual Odometry and Computer Vision Applications Based on Location Clues
4D Hand Object Interaction: Geometric Understanding and Applications in Dexterous Manipulation
Computer Vision for Fashion, Art, and Design
Creative domains render a big part of modern society, having a strong influence on the economy and cultural life. Much effort within creative domains, such as fashion, art and design, center around the creation, consumption, manipulation and analytics of visual content. In recent years, there has been an explosion of research in applying machine learning and computer vision algorithms to various aspects of the creative domains. For four years in a row, CVFAD workshop series have been capturing important trends and new ideas in this area. At CVPR 2023, we will continue to bring together artists, designers, and computer vision researchers and engineers. We will keep growing the workshop itself to be a space for conversations and idea exchanges at the intersection of computer vision and creative applications.
1st workshop on Capturing, Interpreting & Visualizing Indoor Living Spaces
With the recent advances in AR/VR recently a wider range of applications such as virtual touring, Building Information Modeling (BIM), e.g. floorplan generation and 3D holistic understanding have been emerging. Such applications have attracted a lot of interest from both academia and industry and motivated a lot of investments in the form of dataset collection, research, publications and products. A few recent examples of such datasets are Zillow Indoor Dataset (ZInD), Apple’s ARKit Scenes dataset and Facebook’s Habitat-Matterport dataset. The size and unique type of annotations provided by each of these datasets provide a huge opportunity for CV/ML researchers to focus on different aspects of scene and environment understanding beyond what was possible before.
Motivated by the recent release of datasets such as Zillow Indoor Dataset (ZInD), Apple's ARKit Scenes dataset and Facebook's Habitat-Matterport dataset, in this workshop we would like to bring industry and academia together and encourage both to focus on specific under explored aspects of environment understanding. We encourage researchers to go beyond "scene understanding" and explore "environment understanding" with a focus on understanding structure through tasks such as 2D/3D room layout estimation, understanding relation of "rooms" for floorplan generation, localization of media within rooms and floorplans, localization of objects within rooms and floorplans. Image, geometric, and semantic information can also be used to reimagine the appearance of home interiors in a photorealistic manner.
The Fifth Workshop on Precognition: Seeing through the Future
Vision-based detection and recognition studies have been recently achieving highly accurate performance and were able to bridge the gap between research and real-world applications. Beyond these well-explored detection and recognition capabilities of modern algorithms, vision-based forecasting will likely be one of the next big research topics in the field of computer vision. Vision-based prediction is one of the critical capabilities of humans, and the potential success of automatic vision-based forecasting will empower and unlock human-like capabilities in machines and robots.
One important application is in autonomous driving technologies, where a vision-based understanding of a traffic scene and prediction of the movement of traffic actors is a critical piece of the autonomous puzzle. Various sensors such as cameras and lidar are used as the "eyes" of a vehicle, and advanced vision-based algorithms are required to allow safe and effective driving. Another area where vision-based prediction is used is the medical domain, allowing deep understanding and prediction of future medical conditions of patients. However, despite its potential and relevance for real-world applications, visual forecasting or precognition has not been the focus of new theoretical studies and practical applications as much as detection and recognition problems.
Through the organization of this workshop, we aim to facilitate further discussion and interest within the research community regarding this nascent topic. This workshop will discuss recent approaches and research trends not only in anticipating human behavior from videos but also precognition in multiple other visual applications, such as medical imaging, healthcare, human face aging prediction, early even prediction, autonomous driving forecasting, etc.
AVA: Accessibility, Vision, and Autonomy Meet
The goal of this workshop is to gather researchers, students, and advocates who work at the intersection of accessibility, computer vision, and autonomous and intelligent systems. In particular, we plan to use the workshop to identify challenges and pursue solutions for the current lack of shared and principled development tools for vision-based accessibility systems. For instance, there is a general lack of vision-based benchmarks and methods relevant to accessibility (e.g., people using mobility aids are currently mostly absent from largescale datasets in pedestrian detection). Towards building a community of accessibility-oriented research in computer vision conferences, we also introduce a large-scale fine-grained computer
vision challenge. The challenge involves visual recognition tasks relevant to individuals with disabilities. We aim to use the challenge to uncover research opportunities and spark the interest of computer vision and AI researchers working on more robust and broadly usable visual reasoning models in the future. An interdisciplinary panel of speakers will further provide an
opportunity for fostering a mutual discussion between accessibility, computer vision, and robotics researchers and practitioners.
High-fidelity Neural Actors
QCVML: Quantum Computer Vision and Machine Learning Workshop
The 3rd Workshop on Light Fields for Computer Vision LFNAT: New Applications and Trends in Light Fields
4D Light fields can capture both intensity and directions of light rays, and record 3D geometry in a convenient and efficient manner. In the past few years, various areas of research are trying to use light fields to obtain superior performance internal structure information. Light fields have been widely used with remarkable results in some applications like depth estimation, super-resolution and so on. While the attempts in other applications like object detection and semantic segmentation are still in preliminary stage due to the lack of corresponding datasets, and incompatibility between redundant context information and limited memory. Meanwhile, more and more novel and powerful technologies like Neural Radiance Fields and Multiplane Image have been introduced into computer vision, there will be plenty of opportunities and challenges to incorporate them with light fields. To this end, this workshop focuses on two brand new topics. The first is to introduce the light field into more application areas, break through the bottleneck between rich structural information and limited memory, and achieve stable performance. The second is to explore how to introduce emerging technologies from other research fields into light fields to create new technological effects and drive competition. Besides, this workshop also hosts competitions about light field semantic segmentation and depth estimation to invite more researchers to the field.
The Fourth Workshop on Face and Gesture Analysis for Health Informatics (FGAHI)
2nd Workshop and Challenge on Vision Datasets Understanding
Pixel-level Video Understanding in the Wild Challenge
Pixel-level Scene Understanding is one of the fundamental problems in computer vision, which aims at recognizing object classes, masks and semantics of each pixel in the given image. Since the real-world is actually video-based rather than a static state, learning to perform video semantic/panoptic segmentation is more reasonable and practical for realistic applications. To advance the semantic/panoptic segmentation task from images to videos, we present two large-scale datasets (VSPW and VIPSeg) and a competition in this workshop, aiming at performing the challenging yet practical Pixel-level Video Understanding in the Wild (PVUW).
7th AI City Challenge Workshop
5th Workshop and Competition on Affective Behavior Analysis in-the-wild
The Workshop has a unique aspect of fostering cross-pollination of different disciplines, bringing together experts (from academia & industry) and researchers of computer vision and pattern recognition, AI, machine learning, HCI, multimedia, robotics and psychology. The diversity of human behavior, the richness of multi-modal data that arises from its analysis, and the multitude of applications that demand rapid progress in this area ensure that our event provides a timely and relevant discussion and dissemination platform.
The workshop includes keynote talks from Prof. Gunes and Prof. Lapedriza, as well as presentations from experts and researchers within academia and industry on topics related to affective computing and behavior analysis.
The detailed agenda of the workshop can be found on the workshop's website.
3rd Mobile AI Workshop and Challenges
Over the past years, mobile AI-based applications are becoming more and more ubiquitous. Various deep learning models can now be found on any mobile device, starting from smartphones running portrait segmentation, image enhancement, face recognition and natural language processing models, to smart-TV boards coming with sophisticated image super-resolution algorithms. The performance of mobile NPUs and DSPs is also increasing dramatically, making it possible to run complex deep learning models and to achieve fast runtime in the majority of tasks.
While many research works targeted at efficient deep learning models have been proposed recently, the evaluation of the obtained solutions is usually happening on desktop CPUs and GPUs, making it nearly impossible to estimate the actual inference time and memory consumption on real mobile hardware. To address this problem, we introduce the first Mobile AI Workshop, where all deep learning solutions are developed for and evaluated on mobile devices.
Due to the performance of the last-generation mobile AI hardware, the topics considered in this workshop will go beyond the simple classification tasks, and will include such challenging problems as image denoising, HDR photography, accurate depth estimation, learned image ISP pipeline, real-time image and video super-resolution. All information about the challenges, papers, invited talks and workshop industry partners is provided at: https://ai-benchmark.com/workshops/mai/2023/
Vision-Centric Autonomous Driving (VCAD)
4th International Workshop on Event-based Vision
This workshop is dedicated to event-based cameras, smart cameras, and algorithms processing data from these sensors. Event-based cameras are bio-inspired sensors with the key advantages of microsecond temporal resolution, low latency, very high dynamic range, and low power consumption. Because of these advantages, event-based cameras open frontiers that are unthinkable with standard frame-based cameras (which have been the main sensing technology for the past 60 years). These revolutionary sensors enable the design of a new class of algorithms to track a baseball in the moonlight, build a flying robot with the agility of a bee, and perform structure from motion in challenging lighting conditions and at remarkable speeds. These sensors became commercially available in 2008 and are slowly being adopted in computer vision and robotics. In recent years they have received attention from large companies, e.g., the event-sensor company Prophesee collaborated with Intel and Bosch on a high spatial resolution sensor, Samsung announced mass production of a sensor to be used on hand-held devices, and they have been used in various applications on neuromorphic chips such as IBM’s TrueNorth and Intel’s Loihi. The workshop also considers novel vision sensors, such as pixel processor arrays (PPAs), which perform massively parallel processing near the image plane. Because early vision computations are carried out on-sensor, the resulting systems have high speed and low-power consumption, enabling new embedded vision applications in areas such as robotics, AR/VR, automotive, gaming, surveillance, etc. This workshop will cover the sensing hardware, as well as the processing and learning methods needed to take advantage of the above-mentioned novel cameras.
4th Agriculture-Vision Workshop: Challenges & Opportunities for Computer Vision in Agriculture
Image Matching: Local Features and Beyond
Multi-Agent Behavior: Properties, Computation and Emergence
The Sixth International Workshop on Computer Vision for Physiological Measurement (CVPM)
8th Workshop on Computer Vision for Microscopy Image Analysis
High-throughput microscopy enables researchers to acquire thousands of images automatically over a matter of hours. This makes it possible to conduct large-scale, image-based experiments for biological discovery. The main challenge and bottleneck in such experiments is the conversion of “big visual data” into interpretable information and hence discoveries. Visual analysis of large-scale image data is a daunting task. Cells need to be located and their phenotype (e.g., shape) described. The behaviors of cell components, cells, or groups of cells need to be analyzed. The cell lineage needs to be traced. Not only do computers have more “stamina” than human annotators for such tasks, they also perform analysis that is more reproducible and less subjective. The post-acquisition component of high-throughput microscopy experiments calls for effective and efficient computer vision techniques.
This workshop will bring together computer vision experts from academia, industry, and government who have made progress in developing computer vision tools for microscopy image analysis. It will provide a comprehensive forum on this topic and foster in-depth discussion of technical and application issues as well as cross-disciplinary collaboration. It will also serve as an introduction to researchers and students curious about this important and fertile field.
VizWiz Grand Challenge: Describing Images and Videos Taken by Blind People
First Rhobin Challenge - Reconstruction of human-object interaction
Women in Computer Vision Workshop
The half-day Women in Computer Vision (WiCV) workshop is a gathering for researchers of all genders and career stages. All are welcome and encouraged to attend the workshop. Topics include - wide range of areas, including object recognition, image understanding, video analysis, 3D reconstruction, etc.
Virtual Poster Session from 12:15 - 1:00 pm at https://topia.io/wicvcvpr2023
The 6th Workshop and Prize Challenge Bridging the Gap between Computational Photography and Visual Recognition (UG2+) in conjunction with IEEE CVPR 2023
The rapid development of computer vision algorithms increasingly allows automatic visual recognition to be incorporated into a suite of emerging applications. Some of these applications have less-than-ideal circumstances such as low-visibility environments, causing image captures to have degradations. In other more extreme applications, such as imagers for flexible wearables, smart clothing sensors, ultra-thin headset cameras, implantable in vivo imaging, and others, standard camera systems cannot even be deployed, requiring new types of imaging devices. Computational photography addresses the concerns above by designing new computational techniques and incorporating them into the image capture and formation pipeline. This raises a set of new questions. For example, what is the current state-of-the-art for image restoration for images captured in non-ideal circumstances? How can inference be performed on novel kinds of computational photography devices?
Continuing the success of the 1st (CVPR'18), 2nd (CVPR'19), 3rd (CVPR'20), 4th (CVPR'21), and 5th (CVPR'22) UG2 Prize Challenge workshops, we provide its 6th version for CVPR 2023. It will inherit the successful benchmark dataset, platform and evaluation tools used by the previous UG2 workshops, but will also look at brand new aspects of the overall problem, significantly augmenting its existing scope.
O-DRUM: Workshop on Open-Domain Reasoning Under Multi-Modal Settings
Joint 3rd Ego4D and 11th EPIC Workshop on Egocentric Vision
This joint full-day workshop is the longstanding event that brings together the strongly growing egocentric computer vision community, offering the 3rd Ego4D edition and the 11th Egocentric Perception, Interaction and Perception (EPIC) edition. This year, 17 Ego4D benchmark and 9 EPIC benchmark winners and findings will be presented throughout the day, ranging from social interactions, episodic memory, hand-object interactions, long-term tracking, video object segmentations and audio-based interaction recognition. In addition to the recurring Ego4D and EPIC challenges, new challenges are associated with recently released benchmarks EgoTracks, PACO, EPIC-KTICHENS VISOR and EPIC-Sounds.
Additionally, the day will include accepted abstracts, invited CVPR papers and 5 keynotes by Andrea Vedaldi (Oxford and Meta), Hyun Soo Park (UMinnesota), David Fouhey (UMich) and Suraj Nair (Stanford). Check the program for details.
2nd Workshop on Federated Learning for Computer Vision
Federated Learning (FL) has become an important privacy-preserving paradigm in various machine learning tasks. However, the potential of FL in computer vision applications, such as face recognition, person re-identification, and action recognition, is far from being fully exploited. Moreover, FL has rarely been demonstrated effectively in advanced computer vision tasks such as object detection and image segmentation, compared to the traditional centralized training paradigm. This workshop aims at bringing together researchers and practitioners with common interests in FL for computer vision and studying the different synergistic relations in this interdisciplinary area. The day-long event will facilitate interaction among students, scholars, and industry professionals from around the world to discuss future research challenges and opportunities.
EMBEDDED VISION WORKSHOP 2023
Embedded vision is an active field of research, bringing together efficient learning models with fast computer vision and pattern recognition algorithms. We tackle many areas of robotics and intelligent systems and enjoy an impressive growth today.
Workshop on Vision-based InduStrial InspectiON (VISION)
The VISION workshop aims to provide a platform for the exchange of scholarly innovations and emerging practical challenges in Vision-based Industrial Inspection. Through a series of keynote talks, technical presentations, and challenge competition, this workshop is intended to (i) bring together researchers from the interdisciplinary research communities related to computer vision-based inspection; (ii) connect researchers and industry practitioners to synergize recent research progress and current needs in industrial practice.
Secure and Safe Autonomous Driving Workshop and Challenge (SSAD)
The 6th Efficient Deep Learning for Computer Vision
9th IEEE International Workshop on Computer Vision in Sports (CVsports)
4th Embodied AI Workshop
Vision for All Seasons: Adverse Weather and Lighting Conditions
Safe Artificial Intelligence for All Domains
The workshop focuses on bringing together researchers, engineers, and practitioners from academia, industry, and government to exchange ideas, share their latest research, and discuss the latest trends and challenges in this field. The workshop also aims to foster collaboration between different stakeholders, including computer vision researchers, machine learning experts, robotics engineers and safety experts, to create a comprehensive framework for developing safe AI systems for all domains.
Overall, the SAIAD workshop aims to advance the state-of-the-art in safe AI, address the most pressing challenges, and provide a platform for networking and knowledge sharing among the experts in this field.
Visual Pre-training for Robotics
Computer Vision in the Wild
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concepts.
Recent works show that learning from large-scale image-text data is a promising approach to building transferable visual models that can effortlessly adapt to a wide range of downstream computer vision (CV) and multimodal (MM) tasks. For example, CLIP, ALIGN and Florence for image classification, ViLD, RegionCLIP, GLIP and OWL-ViT for object detection, GroupViT, OpenSeg, MaskCLIP, X-Decoder, Segment Anything (SAM) and SEEM for segmentation, Multimodal GPT-4, LLaVA and MiniGPT4 for langauge-and-image instruction-following chat assistants. These vision models with language or interactive interface are naturally open-vocabulary recogntion models, showing superior zero-shot and few-shot adaption performance on various real-world scenarios.
We host this "Computer Vision in the Wild (CVinW)" workshop, aiming to gather academic and industry communities to work on CV and MM problems in real-world scenarios, focusing on the challenge of open-set/domain visual recognition at different granularities and efficient task-level transfer. To measure the progress of CVinW, we develop new benchmarks for image classification, object detection and segmentation to measure the task-level transfer ablity of various models/methods over diverse real-world datasets, in terms of both prediction accuracy and adaption efficiency. This workshop is a continuation of our ECCV 2022 CVinW Workshop. For those who are new to this topic, please check out the CVinW Reading List.
In this year, our workshop will host two new challenges:
- Segmentation in the Wild (SGinW): Open-set instance/semantic/panoptic segmentation on dozens of semgnetaion datasets in the realistic scenarios.
- Roboflow 100 for Object Detection in the Wild: An augmented version of our ODinW by increasing the datasets to hundreds to cover more diverse application domains.
Workshop on Foundation Models: 1st Foundation Model Challenge
Visual Copy Detection Workshop
The Visual Copy Detection Workshop (VCDW) explores the task of identifying copied images and videos, robust to common transformations. This task is central to social problems facing online services where users share media, such as combating misinformation and exploitative imagery, as well as enforcing copyright. Recently, copy detection methods have been used to identify and promote original content, and to reduce memorization in both predictive and generative models.
The workshop will explore technical advances in copy detection as well as the applications that motivate this research. The workshop will feature the Video Similarity Challenge, a copy detection challenge in the video domain, including presentations by challenge participants.
The Fifth Workshop on Deep Learning for Geometric Computing
AI for Content Creation
L3D-IVU: 2nd Workshop on Learning with Limited Labelled Data for Image and Video Understanding
Sight and Sound
The 2nd Explainable AI for Computer Vision (XAI4CV) Workshop
Scholars and Big Models — How Can Academics Adapt?
2nd Challenge on Machine Visual Common Sense: Perception, Prediction, Planning
2nd Workshop on Multimodal Learning for Earth and Environment (MultiEarth)
The 4th Workshop on Omnidirectional Computer Vision
Our objective is to provide a venue for novel research in omnidirectional computer vision with an eye toward actualizing these ideas for commercial or societal benefit. As omnidirectional cameras become more widespread, we want to bridge the gap between the research and application of omnidirectional vision technologies. Omnidirectional cameras are already widespread in a number of application areas such as automotive, surveillance, photography, simulation and other use-cases that benefit from large field of view. More recently, they have garnered interest for use in virtual and augmented reality. We want to encourage the development of new models that natively operate on omnidirectional imagery as well as close the performance gap between perspective-image and omnidirectional algorithms.
Photogrammetric Computer Vision
PCV2023 is a half-day workshop at CVPR2023 which provides a forum for original research in computer vision and photogrammetry. PCV2023 invites submissions of high-quality research papers concerning the generation, processing, and analysis of images, 3D point clouds and surface models, with the goal of enhancing accuracy and completeness. Topics of interest include, but are not limited to:
- Feature extraction, matching, and sensor orientation and sensor fusion
- Structure from motion and SLAM
- Stereo (multi-view) and surface reconstruction
- 3D point cloud processing, segmentation, and classification
- Multi-temporal analysis, dynamic scene understanding
- 3D scene analysis and semantic segmentation
RetailVision - Revolutionizing the World of Retail
The 4th Face Anti-spoofing Workshop and Challenge
DynaVis: The 4th International Workshop on Dynamic Scene Reconstruction
Reconstruction of general dynamic scenes is motivated by potential applications in film and broadcast production together with the ultimate goal of automatic understanding of real-world scenes from distributed camera networks. With recent advances in hardware and the advent of virtual and augmented reality, dynamic scene reconstruction is being applied to more complex scenes with applications in Entertainment, Games, Film, Creative Industries and AR/VR/MR. We welcome contributions to this workshop in the form of oral presentations and posters.