Link

Code: Python3.7

Main packages: Pytorch, Fastai, Pandas

Read Time: ~ 15-30 min

Github: food101



Food 101

Fastai and RNN


We will be exploring a food classification problem. We will use a convolutional neural network backbone and a fully connected head with a single hidden layer as a classifier.

SoTAs

ModelAugmentationsCropsEpochsAdditional NotesTop-1 Accuracy %Top-5 Accuracy %
Inception V3Flip, Rotation, Color, Zoom10 crops for validation32Manually doing transformations and crops during validation88.2896.88
WISeRFlip, Rotation, Color, Zoom10 crops for validation~32Ensemble of Residual and Slice Network90.2798.71
ResNet50 + fastaiOptimal transformationsTest time augmentations16Using a size of 512 only for later epochs90.5298.34

Dataset Attributes
101 food categories
101,000 images
250 test images, 750 training images per class.
Training images were not cleaned.
Images were rescaled to have max sidelength of 512 pixels.
%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai import *
from fastai.vision import *
from fastai.callbacks.hooks import *
bs = 64
model = models.resnet50 # Residual neural net
img_size = 224
path = untar_data(URLs.FOOD)
path_img = path/'images'
Train/Test Split
Train: 60,600 images
Val : 15,150 images
Test : 25,250 images
train_path = path/'train.txt'
test_path = path/'test.txt'

def file_df(path):
    '''
    Pandas dataframe from csv.
    Place images into dataframe.
    '''
    df = pd.read_csv(path, delimiter='/', header=None, names=['label', 'name'])
    df['name'] =  df['label'].astype(str) + "/" + df['name'].astype(str) + ".jpg"
    return df

train_df = file_df(train_path)
test_df = file_df(test_path)
data = (ImageList.from_df(df=train_df, path=path/'images', cols=1)
        .split_by_rand_pct(0.2)
        .label_from_df(cols=0)
        .transform(tfms=get_transforms(), size=224)
        .databunch(bs=bs)
        .normalize(imagenet_stats))

Below is a list of the primary transformations used by fastai. Note that the two main things we can change are the degree to which an item will receive a transformation and the degree to how much the random selected range will vary.

Note that the transform method returns a tuple of two lists of transforms; one training + one validation. The second list is limited to resizing.

Default Transformations

typeparamsprobability
crop_padrow_pct(0, 1) : col_pct(0, 1) : padding_mode(reflection)1.0
flip_lrna0.5
symmetric_warpmagnitude(-0.2, 0.2)0.75
rotatedegrees(-10.0, 10.0)0.75
zoomscale(1.0, 1.1), row_pct(0, 1), col_pct(0, 1)0.75
brightnesschange(0.4, 0.6)0.75
contrastscale(0.8, 1.25)0.75
crop_padna1.0
Secondary Transformations
jitter, skew, squish
data.show_batch(rows=3, figsize=(10, 10)) # visualize the data

Original Image

img = open_image(path/'images'/'baklava'/'1006121.jpg') # Example single image
img.show()

Example Transformations Applied

[img.apply_tfms(get_transforms()[0]).show(ax=ax) for i,ax in enumerate(plt.subplots(1, 5, figsize=(16, 8))[1].flatten())];

Classes & Class Count

print(data.classes); print(data.c)

Metrics

Top N Accuracy : measures how often predicted class falls in top N values.

learn = cnn_learner(data, model, metrics=accuracy, callback_fns=ShowGraph) # Top 1 accuracy == traditional accuracy

Optimal Learning Rate

learn.lr_find()
learn.recorder.plot(suggestion=True)

By changing the learning rate every n epochs we can better work with how fast our model is learning.

lr = 1e-2
learn.fit_one_cycle(5, slice(lr))
learn.save('food101-test-5')

Unfreeze

learn.unfreeze()
learn.lr_find()
learn.recorder.plot(suggestion=True)
learn.fit_one_cycle(5, max_lr=slice(1e-5, 1e-3))
learn.save('food101-test-10')
learn.fit_one_cycle(5, max_lr=slice(1e-6, 1e-3))
learn.save('food-101-test-e15')
learn.fit_one_cycle(5, max_lr=slice(1e-6, 1e-3))
learn.save('food-101-test-e20')

Let’s switch to 512 to try increasing the accuracy. From what I understand Fastai resizes the images so that they’re squared. By resizing the image to 512, there may be white space added to some of the images because not all images are already squared.

I’m not sure if progressive image resizing works because the act of increasing the size of image is somewhat similar to the neural nets behavior of learning progressively higher features or if it’s just because the neural net is getting more information. I tested the model with a smaller sized starting image, i.e. 64x64 and this was too small for the image to pick up significant features. This logic seems to be similar to superpixeling.

Note that the size can be rectangle if you specify it.

Note after certain transforms, there will be missing pixels. Those can be set as black, the value of the pixel to the nearest border, or the value of the pixel symmetric to the nearest border. Here we have it set as the default which I believe is black.

Note there are 3 methods of resizing: ResizeMethod.CROP, ResizeMethod.PAD, ResizeMethod.SQUISH

bs=24

data = (ImageList.from_df(df=train_df, path=path/'images', cols=1)
        .split_by_rand_pct(0.2)
        .label_from_df(cols=0)
        .transform(tfms=get_transforms(), size=512, padding_mode='border')
        .databunch(bs=bs)
        .normalize(imagenet_stats))

learn = cnn_learner(data, model, metrics=accuracy, callback_fns=ShowGraph)
learn.load('food-101-test-e20');
# learn.fit_one_cycle(5, max_lr=slice(1e-7, 1e-2))
# learn.save('food101-test-15-512')
learn.lr_find()
learn.recorder.plot(suggestion=True)
learn.fit_one_cycle(2, max_lr=slice(1e-6, 1e-2))
learn.save('food101-test-22-512')
# learn.fit_one_cycle(2, max_lr=slice(1e-9, 1e-4))
# learn.save('food101-test-25-512')
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_top_losses(9, figsize=(15, 11))
interp.most_confused(min_val=5)
interp.plot_confusion_matrix(figsize=(20, 20), dpi=200)

Test Set

bs=24
test_data = (ImageList.from_df(df=test_df, path=path/'images', cols=1)
            .split_none()
            .label_from_df(cols=0)
            .transform(size=512)
            .databunch(bs=bs)
            .normalize(imagenet_stats))

learn = cnn_learner(test_data, model, metrics=accuracy, callback_fns=ShowGraph)
learn.load('food101-test-20-512');

One additional option to improve our model is TTA.

  • TTA(Time tested augmentation):
    • Create multiple augmented copies of each image in the test set. The model will make a prediction for each set, then return an ensemble of those predictions.
learn.validate(test_data.train_dl)

The next step for me would be to slightly increase the variance of some of the available transformations, I believe that the model will learn more quickly and become more robust. – after some experimentation.

Note that our model struggles with fine grained differences.

The other main area that I would focus on is to fix the mislabeled images, then to view additional options for helping with fine grained differences.

As we go through our model it’s important to get to know the data, and checking on the data intermittently throughout the training process is how we know what is working.

</div>

<div class="code-example" markdown="1">

[Link button](http://example.com/){: .btn }

</div>
```markdown
[Link button](http://example.com/){: .btn }
```