Paper
in
Workshop: 2nd MetaFood Workshop

FoodVideoQA: A Novel Baseline Framework for Dietary Monitoring

Siddharth Viswanath ⋅ Krish Shah ⋅ Pengcheng Xi ⋅ Alexander Wong ⋅ Yuhao Chen

Abstract

Food intake monitoring is a crucial area of research in food computing due to its complexity and significant potential for improving health outcomes. While traditional 2D image-based dietary assessments provide basic information, video offers a more detailed understanding of both the quantity of food consumed and the manner in which it is eaten. However, current video-based dietary analysis remains limited to coarse metrics, such as counting bites. In this paper, we introduce FoodVideoQA, a novel approach that leverages Vision-Language Models (VLMs) to analyze food intake videos comprehensively. We discuss the inherent limitations of a VLM-based approach to this problem, demonstrating the necessity for further novel approaches in this field. This work paves the way for future studies for more advanced multimodal food intake measurement and behavioral studies.

Chat is not available.