Rethinking Gradient Projection Continual Learning: Stability / Plasticity Feature Space Decoupling
Zhen Zhao · Zhizhong Zhang · Xin Tan · Jun Liu · Yanyun Qu · Yuan Xie · Lizhuang Ma
West Building Exhibit Halls ABC 354
Continual learning aims to incrementally learn novel classes over time, while not forgetting the learned knowledge. Recent studies have found that learning would not forget if the updated gradient is orthogonal to the feature space. However, previous approaches require the gradient to be fully orthogonal to the whole feature space, leading to poor plasticity, as the feasible gradient direction becomes narrow when the tasks continually come, i.e., feature space is unlimitedly expanded. In this paper, we propose a space decoupling (SD) algorithm to decouple the feature space into a pair of complementary subspaces, i.e., the stability space I, and the plasticity space R. I is established by conducting space intersection between the historic and current feature space, and thus I contains more task-shared bases. R is constructed by seeking the orthogonal complementary subspace of I, and thus R mainly contains more task-specific bases. By putting the distinguishing constraints on R and I, our method achieves a better balance between stability and plasticity. Extensive experiments are conducted by applying SD to gradient projection baselines, and show SD is model-agnostic and achieves SOTA results on publicly available datasets.