DATASCI 415: Statistical Learning and Data Mining
University of Michigan
including slides by Mu Li and Alex Smola
convolution
pooling
residual connections
handwritten digit recognition with LeNet


animation by Vincent Dumoulin

1D convolutions
$$ y_i = \sum_{a=1}^hw_ax_{i+a} + b $$3D convolutions
$$\begin{aligned} y_{i,j,k} &=\sum_{a=1}^h\sum_{b=1}^w\sum_{c=1}^dw_{a,b,c}x_{i+a,j+b,k+c} \\ &\qquad+ b \end{aligned}$$The output of conv layers are smaller its inputs:
Output shape shrinks faster with larger kernels!

animation by Vincent Dumoulin
Padding adds zeros around input to increase output size.


animation by Vincent Dumoulin
Output size from padding $p_h$ rows (total) on the top and bottom and $p_w$ columns on the sides is
$$(n_h - k_h + p_h + 1)\times (n_w - k_w + p_w + 1).$$If $k_h$ and $k_w$ are odd, then we pad by
where $p_h = k_h - 1$, $p_w = k_w - 1$, to match input and output sizes.
Stride is the rows and columns per slide/step
Ex: stride of 3 rows and 2 columns


animation by Vincent Dumoulin
Output size from stride of $s_h$ rows and $s_w$ columns is
$$\lfloor(n_h - k_h + p_h + s_h)/s_h\rfloor\times\lfloor(n_w - k_w + p_w + s_w)/s_w\rfloor.$$