Poster

Pre-training Vision Models with Mandelbulb Variations

Benjamin N. Chiche · Yuto Horikawa · Ryo Fujita

2024 Poster

[ Poster] [Paper PDF]

Abstract

The use of models that have been pre-trained on natural image datasets like ImageNet may face some limitations. First, this use may be restricted due to copyright and license on the training images, and privacy laws. Second, these datasets and models may incorporate societal and ethical biases. Formula-driven supervised learning (FDSL) enables model pre-training to circumvent these issues. This consists of generating a synthetic image dataset based on mathematical formulae and pre-training the model on it.In this work, we propose novel FDSL datasets based on Mandelbulb Variations. These datasets contain RGB images that are projections of colored objects deriving from the 3D Mandelbulb fractal. Pre-training ResNet-50 on one of our proposed datasets MandelbulbVAR-1k enables an average top-1 accuracy over target classification datasets that is at least 1\% higher than pre-training on existing FDSL datasets. With regard to anomaly detection on MVTec AD, pre-training the WideResNet-50 backbone on MandelbulbVAR-1k enables PatchCore to achieve 97.2\% average image-level AUROC. This is only 1.9\% lower than pre-training on ImageNet-1k (99.1\%) and 4.5\% higher than pre-training on the second-best performing FDSL dataset i.e. VisualAtom-1k (92.7\%). Regarding Vision Transformer (ViT) pre-training, another dataset that we propose and coin MandelbulbVAR-Hybrid-21k enables ViT-Base to achieve 82.2\% top-1 accuracy on ImageNet-1k, which is 0.4\% higher than pre-training on ImageNet-21k (81.8\%) and only 0.1\% lower than pre-training on VisualAtom-1k (82.3\%).

Chat is not available.