Assignment 2 Report Design

Robustness and Generalization Study

Evaluate how models behave under realistic distribution shifts and non-ideal data settings.

Primary goalMinimize robustness gap between IID and OOD settings.

Decision metricOOD weighted F1 with ECE as a guardrail.

DeliverableRobustness profile with actionable mitigation steps.

Scope and Targets

Area	Baseline	Advanced	Success Condition
Shift Testing	IID Validation	OOD Robustness Suite	Robustness gap reduced by at least 20%.
Data Strategy	Standard Split	Stress and Stratified Splits	Stable performance across shift buckets.
Model Strategy	Single Model	Regularization / Ensemble	Lower variance across seeds and perturbations.

Shift Quality

Report both absolute scores and relative degradation percentages.

Calibration

Identify overconfident predictions under synthetic and real shifts.

Stability

Track variance to avoid one-off improvements with weak reliability.

Robust method reduced OOD degradation by __% versus baseline while preserving IID quality.

Most fragile scenario remains __, with failure mode: __.

Prioritize calibration tuning and targeted augmentation for shift: __.