- Understand conv net layers in a CNN

Hello,

I have been learning about convolutional neural networks (CNNs) recently and wonder if I could get some help with a specific question:

At very end, the 9 new features maps from the last convolutional layer are all flattened into a vector (1D array) with as many elements as the nodes in the input layer of the artificial neural network: starting with the first feature map, its rows are concatenated one by one in a straight line and this process continues for all other 8 feature maps. What we get a is a very long 1D vector that is then fed into the input layer of the ANN...

Thanks!

- Assume we start with an input grayscale image having size NxN pixels. The image is passed to the 1st convolutional layer which has 3 filters (kernels) of smaller size called K1, K2, K3.
- Three convolutions are performed in this first conv layer: the 3 different kernels are sequentially applied to the input image to create the 3 different feature maps FP1, FP2, FP3 (the outputs of the convolution operations).
- The 3 feature maps FP1, FP2, FP3 are then stacked in a 3D matrix called M1.

**How are the 3 kernels K4, K5, K6 in the 2nd conv layer applied to the 3 feature maps FP1, FP2, FP3 generated in the 1st conv layer?**

**How are the 3 kernels K4, K5, K6 in the 2nd conv layer applied to the 3 feature maps FP1, FP2, FP3 generated in the 1st conv layer?**

Is K4 convolved with FP1, FP2, FP3, then K5 is convolved with FP1, FP2, FP3, and finally K6 is convolved with FP1, FP2, FP3? If so, we end up with a volume containing 9 new feature maps. Is that correct?

