Python Generating Images with CNN: Why is the Size Not Increasing in Each Layer?

  • Thread starter Thread starter BRN
  • Start date Start date
  • Tags Tags
    Images
Click For Summary
SUMMARY

The discussion focuses on generating images using a Convolutional Neural Network (CNN) with TensorFlow. The original model configuration resulted in an output size of 8x8x3 due to incorrect parameters in the transpose convolution layers. The solution involved adjusting the number of layers to four, changing the kernel size to 16x16, and setting the stride to 2, which successfully generated the desired image size of 128x128x3. The correct formula for calculating output size in transpose convolutions was also clarified.

PREREQUISITES
  • Understanding of Convolutional Neural Networks (CNNs)
  • Familiarity with TensorFlow 2.x and Keras API
  • Knowledge of convolutional layer operations and parameters
  • Basic grasp of image dimensions and tensor shapes
NEXT STEPS
  • Explore TensorFlow Keras layers, specifically Conv2DTranspose and their parameters
  • Learn about the mathematical foundations of convolution and transpose convolution operations
  • Investigate advanced techniques for image generation using GANs (Generative Adversarial Networks)
  • Study the impact of different kernel sizes and strides on output dimensions in CNNs
USEFUL FOR

Machine learning practitioners, AI researchers, and developers working on image generation tasks using Convolutional Neural Networks.

BRN
Messages
107
Reaction score
10
Hello everybody,
I have this problem:
starting from a vector of 100 random values, I have to generate an image of size 128x128x3 using a model consisting of a fully completely layer and 5 layer deconv.
This is my model

Python:
def generator_model(noise_dim):
   
    n_layers = 5
    k_w, k_h = [8, 8]
    input_dim = (noise_dim,)
    i_w, i_h, i_d = [8, 8, 1024] # starting filters
    strides = (1, 1)
    weight_initializer = None

    model = tf.keras.Sequential()
   
    model.add(tf.keras.layers.Dense(i_w * i_h * i_d, input_shape = input_dim, kernel_initializer = weight_initializer))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.ReLU())
   
    model.add(tf.keras.layers.Reshape((i_w, i_h, i_d)))
    for i in range(n_layers - 1):
        print(k_w, k_h)
        model.add(tf.keras.layers.Conv2DTranspose(i_d, (k_w, k_h), strides, padding = 'same', use_bias = False))
        model.add(tf.keras.layers.BatchNormalization())
        model.add(tf.keras.layers.ReLU())
        i_d = int(i_d / 2)
        k_w = int(k_w * 2)
        k_h = int(k_h * 2)
       
    k_w = i_d
    k_h = i_d
    model.add(tf.keras.layers.Conv2DTranspose(3, (k_w, k_h), strides, padding = 'same', use_bias = False))

    return model

Why do I always get an 8x8x3 image without having an increase in size in each layer?

Thank's
 
Last edited by a moderator:
Technology news on Phys.org
Ok, I have the solution.

The size of the outputs of a CNN "conv" is given by the equation

$$o=\left ( \frac{i - k + 2p}{s} \right )+1$$

but, as in my case, for a transpose convolution "deconv" the size of the outputs is

$$o=s\left (i -1 \right )+ k - 2p$$

Then, with stride ##s=2##, the correct code is this

Python:
def generator_model(noise_dim):
 
    n_layers = 4
    k_w, k_h = [16, 16] # starting kernel size
    input_dim = (noise_dim,)
    i_w, i_h, i_d = [8, 8, 1024] # starting filters
    strides = (2, 2)
    weight_initializer = None

    model = tf.keras.Sequential()
 
    model.add(tf.keras.layers.Dense(i_w * i_h * i_d, input_shape = input_dim, kernel_initializer = weight_initializer))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.ReLU())
 
    model.add(tf.keras.layers.Reshape((i_w, i_h, i_d)))
    i_d = int(i_d / 2)
    for i in range(n_layers - 1):
#        print(i_d, k_w, k_h)
        model.add(tf.keras.layers.Conv2DTranspose(i_d, (k_w, k_h), strides, padding = 'same', use_bias = False))
        model.add(tf.keras.layers.BatchNormalization())
        model.add(tf.keras.layers.ReLU())
        i_d = int(i_d / 2)
        k_w = int(k_w * 2)
        k_h = int(k_h * 2)

    model.add(tf.keras.layers.Conv2DTranspose(3, (k_w, k_h), strides, padding = 'same', use_bias = False))

    return model

And this solves my problem :smile:
 
BRN said:
And this solves my problem :smile:
Great!
In the future, it would be helpful to readers to expand acronyms such as CNN, which might not be generally known. I'm assuming it has something to do with neural networks, but that's only a guess.
 
Hello and thanks!
You're right, CNN means Convolutional Neural Network.
Next time I will write the acronyms explicitly :smile:
 
We have many threads on AI, which are mostly AI/LLM, e.g,. ChatGPT, Claude, etc. It is important to draw a distinction between AI/LLM and AI/ML/DL, where ML - Machine Learning and DL = Deep Learning. AI is a broad technology; the AI/ML/DL is being developed to handle large data sets, and even seemingly disparate datasets to rapidly evaluated the data and determine the quantitative relationships in order to understand what those relationships (about the variaboles) mean. At the Harvard &...