2. Transfer Learning Protocol
Due to the size of Caltech-256 (only ~30k images) and the massive parameter count of the architectures, training from scratch is not viable. We utilized a Transfer Learning approach:
- Pre-trained Weights: Both models were initialized with ImageNet-1K pre-trained weights to leverage robust, generalized visual feature extractors.
- Classifier Replacement: The final Fully Connected (FC) layer of ResNet50 and the linear classification head of ViT were replaced with an uninitialized linear layer matching our target distribution (
out_features = 257). - Fine-tuning: All network layers were kept unfrozen and updated during training, allowing both the backbone and the new classifier to adapt to the specific nuances of Caltech-256.