NewsNet: A Novel Dataset for Hierarchical Temporal Segmentation
Haoqian Wu · Keyu Chen · Haozhe Liu · Mingchen Zhuge · Bing Li · Ruizhi Qiao · Xiujun Shu · Bei Gan · Liangsheng Xu · Bo Ren · Mengmeng Xu · Wentian Zhang · Raghavendra Ramachandra · Chia-Wen Lin · Bernard Ghanem
West Building Exhibit Halls ABC 233
Temporal video segmentation is the get-to-go automatic video analysis, which decomposes a long-form video into smaller components for the following-up understanding tasks. Recent works have studied several levels of granularity to segment a video, such as shot, event, and scene. Those segmentations can help compare the semantics in the corresponding scales, but lack a wider view of larger temporal spans, especially when the video is complex and structured. Therefore, we present two abstractive levels of temporal segmentations and study their hierarchy to the existing fine-grained levels. Accordingly, we collect NewsNet, the largest news video dataset consisting of 1,000 videos in over 900 hours, associated with several tasks for hierarchical temporal video segmentation. Each news video is a collection of stories on different topics, represented as aligned audio, visual, and textual data, along with extensive frame-wise annotations in four granularities. We assert that the study on NewsNet can advance the understanding of complex structured video and benefit more areas such as short-video creation, personalized advertisement, digital instruction, and education. Our dataset and code is publicly available at: https://github.com/NewsNet-Benchmark/NewsNet.