Workshop
PixFoundation: Workshop on Pixel-level Vision Foundation Models
Mennatullah Siam · Stella X. Yu · Sangwoo Mo · Leonid Sigal · Raoul de Charette · Tanzila Rahman · He Zhao · Aoran Xiao
Thu 12 Jun 6:30 a.m. PDT — 3:30 p.m. PDT
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
In recent years, foundation models have gained significant traction and success. Such foundation models were shown to effectively adapt across various downstream tasks, with strong generalization capabilities, especially in zero-shot and few-shot scenarios. There is growing interest and progress specifically in vision foundation models (VFM). Some of the latest models include those trained using self supervision, such as the DINO series, and those utilizing image/text like CLIP, Flamingo, Llava and Cambrian. Various pixel-level vision foundation models have also emerged for image/video referring segmentation and depth estimation. Our workshop aims to bring together researchers dedicated to developing and adapting vision foundation models for pixel-level understanding tasks, including image/video segmentation, referring image/video segmentation and reasoning, tracking, depth estimation, and motion estimation. We will explore major directions in pixel-level understanding with vision foundation models and discuss the opportunities they present, particularly in low-resource settings that could have a positive societal impact. Additionally, we will discuss the risks associated with these models and explore methods to mitigate them. The workshop features seven invited talks, mixing emerging and established researchers, along with posters and selective spotlight presentations.
Schedule
|
-
|
Prompt-Guided Attention Head Selection for Focus-Oriented Image Retrieval
(
Paper
)
>
|
Yuji Nozawa · Yu-Chieh Lin · Kazumoto Nakamura · Youyang Ng 🔗 |
|
-
|
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements ( Paper ) > link | M. Arda Aydın · Efe Çırpar · Elvin Abdinli · Gozde Unal · Yusuf H. Sahin 🔗 |
|
-
|
Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation ( Paper ) > link | Gabriele Rosi · Fabio Cermelli 🔗 |
|
-
|
Hierarchical Semantic Segmentation with Autoregressive Language Modeling
(
Paper
)
>
|
Josh Myers-Dean · Brian Price · Yifei Fan · Danna Gurari 🔗 |