[UDL Study Notes] Ch 10 - Convolutional networks

Use Original Cover Image

Type

Post

Parent

“Understanding Deep Learning” Study Notes

Children

Language

Overview

This post series is a study note that records the process of learning the book "Understanding Deep Learning".

This time, I will cover Chapter 10, Convolutional networks.

Understanding Deep Learning

https://udlbook.github.io/udlbook/

1. Convolutional kernal

In the CNN covered in Chapter 10, unlike FNN, which gives different weights to all units of a layer, the next convolutional layer is created through a convolution operation on the layer.

I thought that the network calculation was very impressive and efficient in that the convolution operation was performed with the same kernal for all units of the layer.

Here, I could see from the book that CNN is much better for image processing because the amount of parameters used in the operation of one layer is significantly lower than that of FNN, and the operation of one unit is affected by the surrounding units.

2. Channel

In CNN, when moving to the next layer, a problem of information clipping occurs due to the activation function. The book describes that channels are used in each layer to solve this problem.

In addition to this, three or four channels are used to represent RGB or CMYK image data.

And usually, in a CNN model, there is a tendency to design the channel to become larger as the layer gets deeper, which can be seen as an interpolation of information loss due to the smaller input size as the layer gets deeper, or as increasing the amount of information extracted as it gets deeper.

3. Down/Upsampling

Chapter 10 introduces downsampling and upsampling techniques to match the layer dimension in CNN.

The techniques introduced here are used to match the dimension of the CNN layer, but I thought that they could also be used as a method of compressing an image to low quality or making it high quality.

Mean pooling in downsampling is to compress by averaging the colors of several pixels, and max pooling is to compress by replacing with the most distinct color.

Upsampling techniques can be interpolated with bilinear interpolation to make an image high quality, or can be converted to high quality using deep learning with trasnposed convolution.

Reference

[1] Prince, S. J. D. (2023). Understanding Deep Learning. The MIT Press. Retrieved from http://udlbook.com