Skip to yearly menu bar Skip to main content


Poster Sun, Jun 7, 2026 • 2:30 PM – 4:30 PM PDT ExHall A

Hear What You See: Video-to-Audio Generation with Diffusion Transformer and Semantic-Temporal Alignment-Ranked Direct Preference Optimization

Kai Wang ⋅ Tao Zhou ⋅ jiayi lei ⋅ Jing Wang ⋅ Jinman Zhao ⋅ Weiguo Pian ⋅ Yuan Cheng ⋅ Yapeng Tian ⋅ Peng Gao ⋅ Bin Fu ⋅ Yihao Liu ⋅ Dimitrios Hatzinakos ⋅ Yuewen Cao

Abstract

Log in and register to view live content