Assignment 1 / Image / Methodology

Methodology

Define pipeline design, optimization strategy, and reproducibility controls.

1. Optimization Strategy

A standardized optimization setup was employed to ensure a fair and consistent evaluation between the baseline (ResNet50) and advanced (ViT) architectures.

Hyperparameter	Value / Selection	Justification
Loss Function	Cross-Entropy Loss	Standard objective function for multi-class classification (257 classes).
Optimizer	AdamW	Combines adaptive learning rates with decoupled weight decay, proving highly effective for training Vision Transformers while remaining stable for ResNet.
Learning Rate	[1e-4]	Kept relatively small to preserve pre-trained weights during fine-tuning.
Batch Size	64	Balances GPU memory constraints on Colab while providing stable gradient estimates.
Epochs	[10]	Sufficient for model convergence, monitored via early stopping to prevent overfitting.

2. Transfer Learning Protocol

Due to the size of Caltech-256 (only ~30k images) and the massive parameter count of the architectures, training from scratch is not viable. We utilized a Transfer Learning approach:

Pre-trained Weights: Both models were initialized with ImageNet-1K pre-trained weights to leverage robust, generalized visual feature extractors.
Classifier Replacement: The final Fully Connected (FC) layer of ResNet50 and the linear classification head of ViT were replaced with an uninitialized linear layer matching our target distribution (out_features = 257).
Fine-tuning: All network layers were kept unfrozen and updated during training, allowing both the backbone and the new classifier to adapt to the specific nuances of Caltech-256.

3. Experiment Tracking & Reproducibility

To ensure scientific rigor and transparent model evaluation, the following controls were implemented:

Seed Initialization: Random states were fixed (e.g., torch.manual_seed(42)) across PyTorch, NumPy, and DataLoader to guarantee dataset splits and parameter initializations are exactly reproducible.
Hardware Environment: Experiments were accelerated using Google Colab GPUs to handle the computational intensity of Vision Transformers.
Telemetry via Weights & Biases (W&B): Real-time logging of training/validation loss, accuracy, and system metrics was centralized on the Wandb cloud platform. This ensures no data is lost upon terminal shutdown and allows for granular comparative analysis across different runs.