Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models
Zangwei Zheng, Mingyuan Ma, Kai Wang, and
3 more authors
Accepted by ICCV 2023 , 2023
Continual learning (CL) can help pre-trained vision- language models efficiently adapt to new or under-trained data distributions without re-training. Nevertheless, dur- ing the continual training of the Contrastive Language- Image Pre-training (CLIP) model, we observe that the model’s zero-shot transfer ability significantly degrades due to catastrophic forgetting. Existing CL methods can miti- gate forgetting by replaying previous data. However, since the CLIP dataset is private, replay methods cannot access the pre-training dataset. In addition, replaying data of pre- viously learned downstream tasks can enhance their per- formance but comes at the cost of sacrificing zero-shot per-
formance. To address this challenge, we propose a novel method ZSCL to prevent zero-shot transfer degradation in the continual learning of vision-language models in both feature and parameter space. In the feature space, a ref- erence dataset is introduced for distillation between the current and initial models. The reference dataset should have semantic diversity but no need to be labeled, seen in pre-training, or matched image-text pairs. In parame- ter space, we prevent a large parameter shift by averag- ing weights during the training. We propose a more chal- lenging Multi-domain Task Incremental Learning (MTIL) benchmark to evaluate different methods, where tasks are from various domains instead of class-separated in a sin- gle dataset. Our method outperforms other methods in the traditional class-incremental learning setting and the MTIL by 9.7% average score. Our code locates at https: //github.com/Thunderbeee/ZSCL.