A Simple Baseline for Video Restoration With Grouped Spatial-Temporal Shift

Dasong Li · Xiaoyu Shi · Yi Zhang · Ka Chun Cheung · Simon See · Xiaogang Wang · Hongwei Qin · Hongsheng Li

West Building Exhibit Halls ABC 152
[ Abstract ] [ Project Page ]
Wed 21 Jun 10:30 a.m. PDT — noon PDT


Video restoration, which aims to restore clear frames from degraded videos, has numerous important applications. The key to video restoration depends on utilizing inter-frame information. However, existing deep learning methods often rely on complicated network architectures, such as optical flow estimation, deformable convolution, and cross-frame self-attention layers, resulting in high computational costs. In this study, we propose a simple yet effective framework for video restoration. Our approach is based on grouped spatial-temporal shift, which is a lightweight and straightforward technique that can implicitly capture inter-frame correspondences for multi-frame aggregation. By introducing grouped spatial shift, we attain expansive effective receptive fields. Combined with basic 2D convolution, this simple framework can effectively aggregate inter-frame information. Extensive experiments demonstrate that our framework outperforms the previous state-of-the-art method, while using less than a quarter of its computational cost, on both video deblurring and video denoising tasks. These results indicate the potential for our approach to significantly reduce computational overhead while maintaining high-quality results. Code is avaliable at

Chat is not available.