Shift Quality
IID vs OOD F1
Report both absolute scores and relative degradation percentages.
Assignment 2 Report Design
Evaluate how models behave under realistic distribution shifts and non-ideal data settings.
Primary goalMinimize robustness gap between IID and OOD settings.
Decision metricOOD weighted F1 with ECE as a guardrail.
DeliverableRobustness profile with actionable mitigation steps.
| Area | Baseline | Advanced | Success Condition |
|---|---|---|---|
| Shift Testing | IID Validation | OOD Robustness Suite | Robustness gap reduced by at least 20%. |
| Data Strategy | Standard Split | Stress and Stratified Splits | Stable performance across shift buckets. |
| Model Strategy | Single Model | Regularization / Ensemble | Lower variance across seeds and perturbations. |
Shift Quality
Report both absolute scores and relative degradation percentages.
Calibration
Identify overconfident predictions under synthetic and real shifts.
Stability
Track variance to avoid one-off improvements with weak reliability.
Robust method reduced OOD degradation by __% versus baseline while preserving IID quality.
Most fragile scenario remains __, with failure mode: __.
Prioritize calibration tuning and targeted augmentation for shift: __.