- Event Ellipsometer: Event-based Mueller-Matrix Video Imaging, Ryota Maeda, Yunseong Moon, Seung-Hwan Baek
- Mobile Diffusion for Video Editing, Amirhossein Habibian 3 3D-Pose-Based Evaluation of the Risk of Sarcopenia, Yu-Hsuan Chiu, Gee-Sern Jison Hsu, Jiunn-Horng Kang, Jie-Syuan Wu
- Automated Video Clustering and Annotation Software (AVCAS), Chukwuemeka Duru, George Awad
- GenECA: A Generalizable Framework for Real-Time Multimodal Embodied Conversational Agents with Emotion-Sensitive Interaction, Santosh Patapati, Trisanth Srinivasan
- AR2D2: Training a Robot Without A Robot, Abhimanyu Saighal, Jiafei Duan, Ranjay Krishna, Dieter Fox
- HiRISE: High-Resolution Image Scaling for Edge ML via In-Sensor Compression and Selective ROI, Brendan Reidy, Peyton Chandarana, Ramtin Zand
- Event-Driven ASL Recognition: Building a DVS Dataset for Neuromorphic Systems, Arshia Eslami, James (Blake) Seekings, Peyton Chandarana, Ramtin Zand
- Real-time Facial Expression Recognition For Intuitive Robot Coaches, Peyton Chandarana, Mohammadreza Mohammadi, Hasti Zanganeh, Ramtin Zand
- TaoAvatar, Jianchuan Chen, Jingchuan Hu, Gaige Wang, Zhonghua Jiang, Tiansong Zhou, Zhiwen Chen, Chengfei Lv
- Toward Provably Private Image Obfuscation with Diffusion Models, Joseph Roberson, Tianbao Ma, Liyue Fan
- VP Lab: a PEFT-Enabled Visual Prompting Laboratory for Semantic Segmentation, Thomas Frick, Niccolo Avogaro, Yagmur G. Cinar, Daniel Caraballo, Cezary Skura, Filip M. Janicki, Piotr Kluska, Brown Ebouky, Nicola Farronato, Florian Scheidegger, Cristiano Malossi, Konrad Schindler, Andrea Bartezzaghi, Roy Assaf, Mattia Rigotti
- Edge AI in Action: Deploying Multi-Modal Models in Edge AI Devices, Fabricio Batista Narcizo, Elizabete Munzlinger, Anuj Dutt, Shan Ahmed Shaffi, Sai Narsi Reddy Donthi Reddy
- Morfis, Dimitrios Mallis, Mohamed Adel Mohamed Ali, Ahmet Serdar Karadeniz, Anis Kacem, Djamila Aouada
Art Gallery Tour with Curator, Luba Elliott
This will be a guided tour of the gallery by the curator Luba Elliott and some of the exhibiting artists.
During this special session, the mentors will switch multiple times between small groups of students. This is your chance to ask questions and get insights and advice from professionals in the field. Lunch will be served in the room.
Space is limited, so be sure to arrive at 11:00am to ensure you have a seat when the session kicks off at 11:15am.
This panel will be moderated by Luba Elliott, the CVPR AI Art Gallery curator, and will feature presentations and discussion with the following gallery artists:
Masaru Mizuochi (Sony Corporation) Mingyong Cheng (UC San Diego) Yamin Xu (Bowling Green State U
The Llama Herd of Models: System 1, 2, 3 Go!
Modern artificial intelligence (AI) systems are powered by foundation models. These models enable AI systems to understand and produce language, to perceive and generate visual content, to recognize and produce speech, and to perform actions in digital environments. While foundation models initially resembled System 1 (thinking fast), they are starting to implement a type of System 2 (thinking slow) that enables them to reason through complex problems before producing an answer. In this talk, I will describe the development of the Llama family of foundational models. I will also argue that the next frontier in the development of foundation models is to equip them with a “System 3” that enables models to think together.
- Robust Zero-Shot Depth Perception through Mono-Stereo Fusion, Luca Bartolomei, Matteo Poggi, Fabio Tosi,Stefano Mattoccia
- Hybrid Rendering for Multimodal Autonomous Driving: Merging Neural and Physics-Based Simulation, Máté Tóth, Péter Kovács, Zoltán Bendefy, Zoltán Hortsin, Tamás Matuszka
- AstroLoc: Robust Space to Ground Image Localizer, Gabriele Berton, Alex Stoken, Carlo Masone
- PromptVFX: Text-Driven Fields for Open-World 3D Gaussian Animation, Mert Kiray, Paul Uhlenbruck, Benjamin Busam
- Towards real-time multimodal world models for interactive experiences, Tabish Rashid, Dave Bignell, Raluca Georgescu, Mariya Hendriksen, Abdelhak Lemkhenter, Shanzheng Tan, Linda Wen, Katja Hofmann, Sarah Parisot
- Live freeway traffic state super-resolution, Junyi Ji, Alex Richardson, Derek Gloudemans, Gergely Zachár, Matthew Nice, William Barbour, Jonathan Sprinkle, Benedetto Piccoli, Daniel B. Work
- Efficient Segmentation for Edge Devices, Xin Li,Shuai Zhang 8 GenEx: Generating an explorable world, TaiMing Lu, Jieneng Chen
- VIZ: Virtual and Physical Navigation System for the Visually Impaired, Trisanth Srinivasan, Santosh Patapati
- A Snapshot Low-Light Depth from Defocus System, Wei Xu, Charles James Wagner, Junjie Luo, Qi Guo
- City-wide interactive image geo-localization with MegaLoc, Gabriele Berton, Carlo Masone
- FruitNinja: 3d object interior texture generation with gaussian splatting, Yuhao Chen, Shahan Nedadahandeh, Fangyu Wu
- Seeing Around Corners in Real-Time using Mobile LiDAR, Aaron Young, Siddharth Somasundaram, Nick Tsao, Nikhil Behari, Akshat Dave, Adithya Pediredla, Ramesh Raskar
- Grounding Pixels in Facts: Distilled Knowledge Retrieval for Factual Text-to-Video Generation, Daniel Lee, Arjun Chandra, Yang Zhou, Yunyao Li, Simone Conia
- Focal Split: Untethered Snapshot Depth from Differential Defocus, Junjie Luo, John Mamish, Alan Fu, Thomas Concannon, Josiah Hester, Emma Alexander, Qi Guo
This will be a guided tour of the gallery by the curator Luba Elliott and some of the exhibiting artists.
Conference-wide reception
Food will be available in Halls CD
Bars available and World Cup Games streamed in Hall A2
Bars and LIVE MUSIC will be featured in the Karl Dean Grand Ballroom