Assignment 1 / Multimodal / Dataset EDA

10-Class Filter Balanced Few-Shot Splits

Food101 becomes a controlled ten-class benchmark.

The multimodal experiment narrows Food101 to ten visually distinct yet semantically overlapping food categories, then constructs balanced few-shot train and validation subsets from the official training split while preserving the full official validation split as test. This makes the evaluation stable and makes method differences easier to attribute to representation quality rather than data skew.

Dataset framing

The filtered subset contains 7,500 training images and 2,500 test images, exactly 1,000 images per selected class overall. Each few-shot configuration draws balanced support sets from the filtered training pool, while the test split stays fixed at 250 images per class.

Filtered Total
10k

Ten classes, each with exactly 1,000 images before few-shot slicing.

Train Pool
7.5k

Official Food101 train split after class filtering.

Held-Out Test
2.5k

Official validation split, preserved in full for stable reporting.

Balance Ratio
0.00

The working subset is perfectly balanced by design across all active classes.

How the few-shot subset is built

Component Definition
Original source`ethz/food101` with 101 classes and official train / validation splits.
Selected classesapple pie, bibimbap, chicken wings, donuts, eggs benedict, french fries, grilled cheese sandwich, hamburger, ice cream, pizza.
Filtered train pool7,500 images total, 750 images per class.
Filtered test pool2,500 images total, 250 images per class.
Validation budget20 images per class for every few-shot setting.
Shot settings8, 16, 32, 64, 128 images per class for few-shot training.
Random seed42 for deterministic support and validation sampling.
This split recipe makes the experiment conservative: the test set is large enough to expose confusion structure, while the few-shot train and validation subsets stay class-balanced so metric changes reflect adaptation quality rather than label frequency.

Every active split is deliberately balanced

Grouped bar chart of Food101 class counts across few-shot train, few-shot dev, and test splits.
For the highest-shot run, each class contributes 128 training images, 20 validation images, and 250 test images. The same symmetry holds at every lower support setting.
Split composition plot for the active experiment subset.
The active experiment surface is still test-heavy. That is intentional: most of the reporting weight sits on held-out evaluation rather than support examples.
Heatmap of balanced class counts across the few-shot and test splits.
Because each split is class-balanced, the downstream metrics are less likely to be inflated by class frequency effects.

Food101 is close to square, but not uniform

Most Food101 images live near a 512-pixel square frame, but aspect ratio still ranges widely enough to matter for crop policy. Elongated plates, tall plated burgers, and tightly framed desserts all stress a naive resize-plus-center-crop pipeline differently.

Image width, height, aspect ratio, and area distributions.
Width spans roughly 287 to 512 pixels and height spans 239 to 512 pixels, so geometric variation is real even in a curated food dataset.
Scatter plot of image width and height colored by aspect ratio.
The width-height scatter confirms that many images are near-square, but enough examples sit above and below the diagonal to influence downstream framing.

Visual variation remains high inside each class

Single example from each selected Food101 class.
Even one reference image per class shows broad variation in plating, background clutter, viewpoint, and crop tightness.
Gallery of multiple examples per selected Food101 class.
The full gallery makes the modeling challenge clearer: apple pie and donuts share dessert textures, while hamburgers and grilled cheese often overlap in bread-heavy compositions.