MLP to CNN
- receptive field: 某一层的一个神经元在原始输入图像上“看到”的区域大小
- feature map: The output of the convolution operation
- stride (rows, cols): How many pixels to move to get the next receptive field
- Pooling layer (max, average): Downsample and provide translation invariance
- Same vs. Valid Padding
- “Same” padding: the output is as same size as input
- Valid padding (same as leaving margin): Do nothing, smaller size as input
- To calculate the output size of a convolutional layer, use the formula:

CNN
- Convolution Layer: Filter (Kernel) extracts local features
- Pooling Layer
- Fully Connected Layer (MLP)
LeNet
AlexNet
- ReLU, Dropout (in dense layer), Data Augmentation, max pooling
- ReLU is used in modern CNNs because it mitigates vanishing gradients
VGG
- Always 3X3 filter sizes, go deeper, turns out better than larger filter sizes
- VGG block: a sequence of conv layers and a max pooling (reducing size)