Paper
in
Workshop: 8th Workshop and Competition on Affective & Behavior Analysis in-the-wild

Robust Stage-Wise LVLM Adaptation: Multi-Phase Prompt Lora Fine-tuning for Compound Expression Recognition

Jun Yu ⋅ Xilong Lu ⋅ Yunxiang Zhang ⋅ Lingsi Zhu ⋅ Yang Zheng ⋅ Yongqi Wang ⋅ Qiang Ling

Abstract

Compound Expression Recognition (CER) is crucial for understanding human emotions and improving human-computer interaction. However, CER faces challenges due to the complexity of facial expressions and the difficulty of capturing subtle emotional cues. To surmount these obstacles, we present a novel approach that harnesses the power of Large Vision-Language Models (LVLMs). Our methodology incorporates a two-stage fine-tuning process, complemented by the design of exclusive prompts. In the first stage, pre-trained LVLMs are fine-tuned on basic facial expressions to establish fundamental patterns. Subsequently, in the second stage, the model is further optimized on a compound-expression dataset to refine the interactions between compound expressions. Our approach has achieved remarkable results. It has attained advanced accuracy on the RAF-DB dataset and demonstrated robust zero-shot generalization on the C-EXPR-DB dataset. Notably, in the 8th ABAW Compound Expression Recognition Challenge, our method secured the first place with an F1 score of 0.5723, highlighting its great potential for real-world applications in emotion analysis and human-computer interaction.

Chat is not available.