Assignment 1

Three classification tracks, one workspace.

Assignment 1 compares baseline and advanced methods across image, text, and multimodal settings, with emphasis on dataset understanding, model design, and evaluation quality.

ImageResNet50 vs ViT on Caltech-256

TextLSTM vs Transformer encoder

MultimodalZero-shot CLIP vs few-shot adaptation

Presentation and Demo Links

Replace the placeholder URLs below with your final public links.

Video

YouTube Presentation

Full walkthrough of the assignment structure, decisions, and results.

Open Presentation Update this URL before publishing.

Video

YouTube Demo

Recorded demo of the application and experiment outputs in action.

Open Demo Update this URL before publishing.

App

Streamlit Application

Unified Assignment 1 app with image, text, and multimodal experiment views.

Open Streamlit App Update this URL before publishing.

Track Pages

Open a track to review dataset EDA, model backbone, methodology, and evaluation pages.

Track 01

Image Classification

Caltech-256 benchmark comparing ResNet50 and Vision Transformer behavior.

Open Image Pages

Track 02

Text Classification

Sequence modeling study contrasting LSTM baselines with transformer encoders.

Open Text Pages

Track 03

Multimodal Classification

CLIP-based zero-shot and few-shot experiments with report-driven error analysis.

Open Multimodal Pages

Method Frame

Baseline vs advanced architecture comparison

Each track compares a conventional baseline against a more expressive model family.

Evaluation Frame

Metric quality and failure visibility

Results are not limited to one score. The report surfaces class-level behavior and confusion structure.

Delivery Frame

Pages, artifacts, and application demo

The assignment combines static report pages with an interactive Streamlit experience.

Assignment Focus

Track	Baseline	Advanced	Main Review Lens
Image	ResNet50	Vision Transformer	Representation quality, class separation, and confusion stability.
Text	LSTM	Transformer Encoder	Sequence understanding, minority-class behavior, and recall trade-offs.
Multimodal	Zero-shot CLIP	Few-shot adaptation	Data efficiency, prompt sensitivity, and practical inference quality.

Navigation Guide

1. Start with the report page

Use this page as the entry point to understand the scope and choose a track.

2. Open the track details

Each track page continues into EDA, backbone, methodology, and results sections.

3. Open the app and video links

Use the external links section for presentation, demo, and deployed application access.