Generating Images with CNN: Why is the Size Not Increasing in Each Layer?

  • Python
  • Thread starter BRN
  • Start date
  • Tags
    Images
In summary, the conversation discusses a problem with generating an image from a vector using a model with a fully connected layer and 5 deconvolution layers. The speaker shares their model code and explains the issue with the output size not increasing in each layer. They then provide a solution involving the correct equation for deconvolution output size and share an updated code that solves the problem. The conversation ends with a note to explicitly write out acronyms for clarity in the future.
  • #1
BRN
108
10
Hello everybody,
I have this problem:
starting from a vector of 100 random values, I have to generate an image of size 128x128x3 using a model consisting of a fully completely layer and 5 layer deconv.
This is my model

Python:
def generator_model(noise_dim):
   
    n_layers = 5
    k_w, k_h = [8, 8]
    input_dim = (noise_dim,)
    i_w, i_h, i_d = [8, 8, 1024] # starting filters
    strides = (1, 1)
    weight_initializer = None

    model = tf.keras.Sequential()
   
    model.add(tf.keras.layers.Dense(i_w * i_h * i_d, input_shape = input_dim, kernel_initializer = weight_initializer))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.ReLU())
   
    model.add(tf.keras.layers.Reshape((i_w, i_h, i_d)))
    for i in range(n_layers - 1):
        print(k_w, k_h)
        model.add(tf.keras.layers.Conv2DTranspose(i_d, (k_w, k_h), strides, padding = 'same', use_bias = False))
        model.add(tf.keras.layers.BatchNormalization())
        model.add(tf.keras.layers.ReLU())
        i_d = int(i_d / 2)
        k_w = int(k_w * 2)
        k_h = int(k_h * 2)
       
    k_w = i_d
    k_h = i_d
    model.add(tf.keras.layers.Conv2DTranspose(3, (k_w, k_h), strides, padding = 'same', use_bias = False))

    return model

Why do I always get an 8x8x3 image without having an increase in size in each layer?

Thank's
 
Last edited by a moderator:
Technology news on Phys.org
  • #2
Ok, I have the solution.

The size of the outputs of a CNN "conv" is given by the equation

$$o=\left ( \frac{i - k + 2p}{s} \right )+1$$

but, as in my case, for a transpose convolution "deconv" the size of the outputs is

$$o=s\left (i -1 \right )+ k - 2p$$

Then, with stride ##s=2##, the correct code is this

Python:
def generator_model(noise_dim):
 
    n_layers = 4
    k_w, k_h = [16, 16] # starting kernel size
    input_dim = (noise_dim,)
    i_w, i_h, i_d = [8, 8, 1024] # starting filters
    strides = (2, 2)
    weight_initializer = None

    model = tf.keras.Sequential()
 
    model.add(tf.keras.layers.Dense(i_w * i_h * i_d, input_shape = input_dim, kernel_initializer = weight_initializer))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.ReLU())
 
    model.add(tf.keras.layers.Reshape((i_w, i_h, i_d)))
    i_d = int(i_d / 2)
    for i in range(n_layers - 1):
#        print(i_d, k_w, k_h)
        model.add(tf.keras.layers.Conv2DTranspose(i_d, (k_w, k_h), strides, padding = 'same', use_bias = False))
        model.add(tf.keras.layers.BatchNormalization())
        model.add(tf.keras.layers.ReLU())
        i_d = int(i_d / 2)
        k_w = int(k_w * 2)
        k_h = int(k_h * 2)

    model.add(tf.keras.layers.Conv2DTranspose(3, (k_w, k_h), strides, padding = 'same', use_bias = False))

    return model

And this solves my problem :smile:
 
  • Like
Likes Ibix
  • #3
BRN said:
And this solves my problem :smile:
Great!
In the future, it would be helpful to readers to expand acronyms such as CNN, which might not be generally known. I'm assuming it has something to do with neural networks, but that's only a guess.
 
  • Like
Likes BRN
  • #4
Hello and thanks!
You're right, CNN means Convolutional Neural Network.
Next time I will write the acronyms explicitly :smile:
 

1. How does a Convolutional Neural Network (CNN) generate images?

A CNN generates images by using a series of convolutional layers, which apply filters to the input image to extract features. These features are then fed into fully connected layers, which use the extracted features to generate the final image.

2. What is the difference between a traditional neural network and a CNN in terms of image generation?

A traditional neural network treats the input image as a flat vector of pixels, which can limit its ability to capture spatial relationships in the image. In contrast, a CNN preserves the spatial structure of the image by using convolutional layers, making it more effective for image generation tasks.

3. How does a CNN learn to generate images?

A CNN learns to generate images through a process called backpropagation, where the network adjusts its parameters based on the error between the generated image and the target image. This process is repeated over many iterations until the network can accurately generate images.

4. Can a CNN generate images of any type?

Yes, a CNN can generate images of any type as long as it is trained on a dataset that contains images of that type. For example, a CNN trained on images of cats can generate new images of cats, but it would not be able to generate images of dogs without additional training on a dataset of dog images.

5. Are there any limitations or challenges in generating images with CNNs?

One limitation of generating images with CNNs is that they require large amounts of training data to accurately generate new images. Additionally, generating highly complex or diverse images can be challenging for CNNs, as they may struggle to capture all the necessary features and details. This can lead to generated images that are blurry or lack certain characteristics.

Similar threads

  • Programming and Computer Science
Replies
1
Views
1K
Back
Top