<h1>Computer Vision by Learning</h1>

<h2>General lab information</h2>
For the five days of the course, you will receive five notebooks. For each notebook, we provide the outline of the topic of the day with basic code, as well as a set of assignments. The assignments are given in red.
<br><br>
<strong>IMPORTANT:</strong>
<br>
To pass the course, we should perform all the assigments. You can submit the notebooks of all the days to <i>P.S.M.Mettes@uva.nl</i> no later than March 24th, 23:59 PM.
<br><br>
Outline of the labs for the five days:
<ol>
<li>Installation and training your first networks.</li>
<li>Visualization</li>
<li>Style transfer</li>
<li>Data augmentation</li>
<li>CIFAR challenge</li>
</ol>

<h2>Day 2: Visualizing Features</h2>
Today we will mainly use Keras that already serves us with commonly used deep learning models pre-trained on Imagenet.

<h3>Loading a Model</h3>
Throughout the day we will use the VGG model (https://arxiv.org/abs/1409.1556). For those not familiar, this is an award-winning Deep Convolutional Network from the Oxford Geometry Group, published in 2014 and often used for transfer learning and style transfer tasks.

In [None]:
from keras.preprocessing import image as image_utils
from keras.applications.imagenet_utils import decode_predictions
from keras.applications.imagenet_utils import preprocess_input 
import keras.backend as K
%matplotlib inline
import matplotlib
import matplotlib.pylab as plt
import matplotlib.cm as cm
from keras.applications.vgg16 import VGG16
import numpy as np
import argparse
import cv2

We are going to investigate a trained model by giving an image to it and obtaining it's intermediate outputs.

In [None]:
# Load 'your_image.jpg', make it an image that is relatively clean and has an object or animal in the center
im_file = './replace_with_your_image.jpg'
image = image_utils.load_img(im_file, target_size=(224, 224))
image = image_utils.img_to_array(image)

Your picture

In [None]:
plt.imshow(image.transpose((1,2,0)))

<h2>Loading the Network and Feeding the Image into it</h2>

In [None]:
# expand the first axis to obtain a batch size of 1
image = np.expand_dims(image, axis=0)
# pre-process the image
image = preprocess_input(image)
# Load the model from keras with Imagenet weights
print("[INFO] loading network...")
model = VGG16(weights="imagenet")

Now we can classify our image.

In [None]:
print("[INFO] classifying image...")
preds = model.predict(image)

In [None]:
decode_predictions(preds)[0][0]

Let's have a look what other layers besides the final prediction layer we have in the model.

In [None]:
model.layers

So now we are going to get the activations from a specific layer of the model.
You can choose anyone you like!

Play a bit with this and look at all the different layers outputs, also for different images besides the one you have loaded so far.

In [None]:
out_1 = np.array(out_1).squeeze()
plt.imshow(out_1[15,:,:],cmap='gray',interpolation='none')

<h3>Comparing the first Layer with an oldschool Filter Bank</h3>
Now you are going to extract features from the first layer and visualize them in a grid.
Some helper functions for that are below. After you have done that, you are going to compare what you have obtained with the output of a standard filterbank.
<br><br>
<strong>The Filter Bank</strong>
<br>
To obtain a manually defined filterbank, use: https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.ndimage.filters.gaussian_filter.html
<br>
Based on the Gaussian filter, write a function that computes rotated first order Gaussian derivatives in [0,30,60,90,120,150,180] degrees and displays the output when you convolve them with your preprocessed image.
<br>
Do the same for the first layer of the VGG network and the second one and also display the outputs.

<font color="red">
<h3>Assignment 1: Comparing the Outputs</h3>
Your task is to argue if there is a similarity between the outputs of the manually created filterbank and the learned one from the VGG.
</font>

<font color="red">
<i>Your text and plots here.</i>
</font>

In [None]:
# utility functions for plotting
from mpl_toolkits.axes_grid1 import make_axes_locatable

def nice_imshow(ax, data, vmin=None, vmax=None, cmap=None):
    """Wrapper around pl.imshow"""
    if cmap is None:
        cmap = cm.jet
    if vmin is None:
        vmin = data.min()
    if vmax is None:
        vmax = data.max()
    divider = make_axes_locatable(ax)
    cax = divider.append_axes("right", size="5%", pad=0.05)
    im = ax.imshow(data, vmin=vmin, vmax=vmax, interpolation='nearest', cmap=cmap)
    plt.colorbar(im, cax=cax)
    
import numpy.ma as ma

def make_mosaic(imgs, nrows, ncols, border=1):
    """
    Given a set of images with all the same shape, makes a
    mosaic with nrows and ncols
    """
    nimgs = imgs.shape[0]
    imshape = imgs.shape[1:]
    
    mosaic = ma.masked_all((nrows * imshape[0] + (nrows - 1) * border,
                            ncols * imshape[1] + (ncols - 1) * border),
                            dtype=np.float32)
    
    paddedh = imshape[0] + border
    paddedw = imshape[1] + border
    for i in xrange(nimgs):
        row = int(np.floor(i / ncols))
        col = i % ncols
        
        mosaic[row * paddedh:row * paddedh + imshape[0],
               col * paddedw:col * paddedw + imshape[1]] = imgs[i]
    return mosaic

<h2>Visualizing the Weights</h2>
Before we have focused on the output feature maps and comparison of them. Now we will investigate the filters themselves. Plot some filters of the first, second and fifth convolution block.

In [None]:
# Visualize weights
W = model.layers[1].W.get_value(borrow=True)
W = np.squeeze(W)[:,0,:,:]
print("W shape : ", W.shape)

plt.figure(figsize=(15, 15))
plt.title('conv1 weights')
nice_imshow(plt.gca(), make_mosaic(W, 8, 8), cmap=cm.binary)

<font color="red">
<h3>Assignment 2: FFT-based Analysis</h3>
Extract the filters of the network and visualize them in the Fourier domain.
Write what you have to keep in mind, the filters are 3x3, but when taking the Fourier transform of a signal, one should consider the size of the signal they are convolved with as well.
<br>
Write what you observe. Can you conclude anything more about the filters when looking at them this way?
</font>

<font color="red">
<i>Your plots and text here.</i>
</font>

In [None]:
# Functions you need
print("W shape : ", W.shape)
from scipy.fftpack import fft as fft
from scipy.fftpack import fftshift as shift

weights_pad = np.pad(W,((0,0),(6,7),(6,7)),'constant', constant_values=(0,))
w_fft = fft(shift(shift(fft(weights_pad,axis=1),axes=1),axes=2),axis=2)
w_abs = np.abs(w_fft)

<font color="red">
<h3>Extra Assignment for the Ambitious: FFT-based Compression</h3>
A very simple approach to model compression is to extract the filters from the model, apply a Fourier transform on them and then threshold their representation in frequency domain.
<br>
Do this on your own MNIST convolutional network and see how much thresholding you can apply on the first layer until the performance degrades significantly.
<br>
What you need for this is model.save_weights() of Keras and Google to find out how you replace elements of hdf5 files like the ones keras saves model weights in.
</font>

<font color="red">
<i>Your text and performance estimates here.</i>
</font>