<h1>Computer Vision by Learning</h1>

<h2>General lab information</h2>
For the five days of the course, you will receive five notebooks. For each notebook, we provide the outline of the topic of the day with basic code, as well as a set of assignments. The assignments are given in red.
<br><br>
<strong>IMPORTANT:</strong>
<br>
To pass the course, we should perform all the assigments. You can submit the notebooks of all the days to <i>P.S.M.Mettes@uva.nl</i> no later than March 24th, 23:59 PM.
<br><br>
Outline of the labs for the reamining days:
<ol>
<li>Data augmentation</li>
<li>CIFAR challenge</li>
</ol>

<h2>Day 4: Data Augmentation</h2>
Today we are going to focus on data-augmentation. It is a standard ingredient in most deep learning pipelines and was one of the key insights in the paper that started all the Deep Learning hype, read section 4: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
<br>
<br>
Once again Keras makes it very easy to do data-augmentation yourself. You simply use the "ImageDataGenerator" that allows to do real-time data augmentation on the CPU during training your model, read up on it's properties here: https://keras.io/preprocessing/image/

<h3>Back to MNIST</h3>
Throughout today we will focus on MNIST again.

In [None]:
import numpy as np
import keras
from keras.datasets import mnist
from keras.preprocessing.image import ImageDataGenerator

(trainx, trainy), (testx, testy) = mnist.load_data()
print "Data shape:", trainx.shape, trainy.shape, testx.shape, testy.shape
print "Unique labels:", np.unique(trainy)

The complete dataset has 60,000 training examples and 10,000 test examples. This time we focus again on a subset of the data, so you can see the effect of data-augmentation as a regularization and artificial training set enlargement very clearly.

In [None]:
# Randomly subsample train and test data given a seed.
seed = 500
np.random.seed(500)

from keras.utils.np_utils import to_categorical

nr_train, nr_test = 1000, 1000
tridxs = np.random.choice(trainx.shape[0], nr_train, replace=False)
trainx = trainx[tridxs]
trainy = to_categorical(trainy[tridxs])
teidxs = np.random.choice(testx.shape[0], nr_test, replace=False)
testx  = testx[teidxs]
testy  = to_categorical(testy[teidxs])
print "Data shape:", trainx.shape, trainy.shape, testx.shape, testy.shape

In [None]:
from keras import backend as K

# Copy data.
ctrainx  = trainx.astype('float32')
ctestx   = testx.astype('float32')

# Change the shape for convnets.
if K.image_dim_ordering() == 'th':
    ctrainx = ctrainx.reshape(ctrainx.shape[0], 1, ctrainx.shape[1], ctrainx.shape[2])
    ctestx  = ctestx.reshape(ctestx.shape[0], 1, ctestx.shape[1], ctestx.shape[2])
else:
    ctrainx = ctrainx.reshape(ctrainx.shape[0], ctrainx.shape[1], ctrainx.shape[2], 1)
    ctestx  = ctestx.reshape(ctestx.shape[0], ctestx.shape[1], ctestx.shape[2], 1)

ctrainx /= 255.
ctestx  /= 255.
    
nr_classes = 10

In [None]:
from keras.layers import Convolution2D, Flatten
from keras.models import Sequential
from keras.layers.core import Dense, Activation

#
# Simple fixed function that returns a new ConvNet.
#
def get_convnet(kernel, nr_filters_l1, nr_filters_l2, input_shape):
    # Initialize a new network.
    model = Sequential()

    # First convolutional layer.
    model.add(Convolution2D(nr_filters_l1, kernel[0], kernel[1], \
                        border_mode='valid', input_shape=input_shape))

    # Second convolutional layer.
    model.add(Convolution2D(nr_filters_l2, kernel[0], kernel[1]))
    
    # Move to fully connected layers.
    model.add(Flatten())
    model.add(Dense(nr_classes))

    # Add the softmax layer.
    model.add(Activation('softmax'))

    return model

# Yield an instance of the model.
convnet_model = get_convnet([5,5], 32, 32, ctrainx.shape[1:])
convnet_model.summary()

Now we set up the model and data augmentation flow, as we use a subset of MNIST at this point, augmentation should give us a nice improvement in performance. 
<br>
However, if we overdo it, we will ruin the performance, imagine what it means for the discriminability between "6" and "9" if you augment your data with random 360 degree rotations for instance, it will be impossible to distinguish them. 
<br><br>
Data-augmentation makes your model invariant with respect to certain transformations and you have to think carefully which ones are harmful and which might help for your type of data.
<br><br>
NOTE: If you use the Theano backend, remove the import and the first line of the train_model function.

In [None]:
from keras.optimizers import SGD
import tensorflow as tf

#
# A function that trains a given model.
#
# model      - Keras model.
# x          - Train features.
# y          - Train labels.
# batch_size - The number of samples for each mini-batch.
# nb_epoch   - The number of training epochs.
# device     - Either '/cpu:0' or '/gpu:0'.
#
def train_model(model, x, y, batch_size=32, nb_epoch=10, device='/cpu:0'):
    with tf.device(device):
        # Compile the model with a specific optimizer.
        model.compile(loss='categorical_crossentropy', optimizer=SGD(), metrics=['accuracy'])

        # Train the model.
        model.fit_generator(datagen.flow(x, y, batch_size=batch_size),
                    samples_per_epoch=len(x), nb_epoch=nb_epoch, verbose=1)
        
        
#
# Evaluate a trained model.
#
# model - The trained model.
# x     - Test features.
# y     - Test labels.
#
def test_model(model, x, y):
    return model.evaluate(x, y, verbose=0)

datagen = ImageDataGenerator(
    featurewise_center=False,
    featurewise_std_normalization=False,
    rotation_range=0.,
    width_shift_range=0.,
    height_shift_range=0.,
    horizontal_flip=False)

datagen.fit(ctrainx)

# Set a number of parameters for the training.
batch_size = 32
nb_epoch   = 10
# Train and test.
train_model(convnet_model, ctrainx, trainy, batch_size, nb_epoch)
score  = test_model(convnet_model, ctestx, testy)
# Show the results.
print "Test score: %.4f, test accuracy: %.4f" %(score[0], score[1])

<br>
<font color="red">
<h3>Assignment: Find the Right Setting and Explain</h3>
For the assignment, we want you to map the importance of the various degrees of freedom within the provided dataset generator for data augmentation.
<br>
<br>
The assignment consists of the following:
<ol>
<li>Train the network without data augmentation.</li>
<li>Evaluate the influence of the forms of data augmentation in isolation and combination.</li>
<li>An important tool to understand failure modes and problems, as well as success is plotting what is happening. Please plot augmented images from successful settings and failing settings to underpin your argumentation for proposing which data augmentations are most useful in this scenario. When do certain translations, rotations, flips, or shifts fail and can you explain why?</li>
</ol>
</font>

<font color="red">
<i>Your text and plots here.</i>
</font>

<font color="red">
<h3>Extra assignment: Spatial Transformer Networks</h3>
If you have time left and are genuinely curious about how to push the data-augmentation and invariance idea even further, have a look at Spatial Transformer Networks (https://arxiv.org/abs/1506.02025). They learn the amount of data-augmentation or invariance necessary during training in a very genuine way.
<br><br>
Here is a corresponding notebook here for Seya, a library that uses Keras to build advanced models: https://github.com/EderSantana/seya/blob/keras1/examples/Spatial%20Transformer%20Networks.ipynb
<br><br>
The corresponding assignment here consists of the following elements:
<ol>
<li>Copy the appropriate code from the mentioned notebook and download the cluttered MNIST dataset.</li>
<li>Apply the dataset to both the standard ConvNet and the Spatial Transformer Network.</li>
<li>Compare the performance and try to explain what the networks are learning using the lessons from day 2.</li>
<li>Discuss how Spatial Transformer Networks compare to the data augmentation of the previous assignment.</li>
</ol>
</font>