SemiCVT: Semi-Supervised Convolutional Vision Transformer for Semantic Segmentation
Huimin Huang · Shiao Xie · Lanfen Lin · Ruofeng Tong · Yen-Wei Chen · Yuexiang Li · Hong Wang · Yawen Huang · Yefeng Zheng
West Building Exhibit Halls ABC 296
Semi-supervised learning improves data efficiency of deep models by leveraging unlabeled samples to alleviate the reliance on a large set of labeled samples. These successes concentrate on the pixel-wise consistency by using convolutional neural networks (CNNs) but fail to address both global learning capability and class-level features for unlabeled data. Recent works raise a new trend that Trans- former achieves superior performance on the entire feature map in various tasks. In this paper, we unify the current dominant Mean-Teacher approaches by reconciling intra- model and inter-model properties for semi-supervised segmentation to produce a novel algorithm, SemiCVT, that absorbs the quintessence of CNNs and Transformer in a comprehensive way. Specifically, we first design a parallel CNN-Transformer architecture (CVT) with introducing an intra-model local-global interaction schema (LGI) in Fourier domain for full integration. The inter-model class- wise consistency is further presented to complement the class-level statistics of CNNs and Transformer in a cross- teaching manner. Extensive empirical evidence shows that SemiCVT yields consistent improvements over the state-of- the-art methods in two public benchmarks.