Assignment 1

Three classification tracks, one workspace.

Assignment 1 compares baseline and advanced methods across image, text, and multimodal settings, with emphasis on dataset understanding, model design, and evaluation quality.

ImageResNet50 vs ViT on Caltech-256

TextLSTM vs Transformer encoder

MultimodalZero-shot CLIP vs few-shot adaptation

Track Pages

Open a track to review dataset EDA, model backbone, methodology, and evaluation pages.

Track 01

Image Classification

Caltech-256 benchmark comparing ResNet50 and Vision Transformer behavior.

Open Image Pages

Track 02

Text Classification

Sequence modeling study contrasting LSTM baselines with transformer encoders.

Open Text Pages

Track 03

Multimodal Classification

CLIP-based zero-shot and few-shot experiments with report-driven error analysis.

Open Multimodal Pages

Method Frame

Baseline vs advanced architecture comparison

Each track compares a conventional baseline against a more expressive model family.

Evaluation Frame

Metric quality and failure visibility

Results are not limited to one score. The report surfaces class-level behavior and confusion structure.

Delivery Frame

Pages, artifacts, and application demo

The assignment combines static report pages with an interactive Streamlit experience.

Assignment Focus

Track Baseline Advanced Main Review Lens
Image ResNet50 Vision Transformer Representation quality, class separation, and confusion stability.
Text LSTM Transformer Encoder Sequence understanding, minority-class behavior, and recall trade-offs.
Multimodal Zero-shot CLIP Few-shot adaptation Data efficiency, prompt sensitivity, and practical inference quality.

Navigation Guide

1. Start with the report page

Use this page as the entry point to understand the scope and choose a track.

2. Open the track details

Each track page continues into EDA, backbone, methodology, and results sections.

3. Open the app and video links

Use the external links section for presentation, demo, and deployed application access.