Matrix dimensions are not matching after differentiation

Click For Summary
SUMMARY

The discussion centers on differentiating a neural network equation involving matrix weights and an input vector, specifically the equation y = ψ(W3 x ψ(W2 x ψ(W1 x I))). The user encounters matrix dimension mismatches when applying the chain rule to differentiate with respect to the weight matrices W1, W2, and W3. The differentiation attempts yield incorrect dimensions, indicating a misunderstanding of matrix calculus rules. The user seeks clarification on the derivative of a function with respect to a matrix and references a Wikipedia discussion page on matrix calculus issues.

PREREQUISITES
  • Understanding of neural network architecture and activation functions, specifically log sigmoid.
  • Familiarity with matrix calculus and differentiation rules.
  • Knowledge of matrix dimensions and operations, particularly with 2x2 and 1x2 matrices.
  • Experience with the chain rule in calculus as applied to matrix functions.
NEXT STEPS
  • Study the rules for differentiating matrix products, focusing on matrix calculus.
  • Learn about the application of the chain rule in the context of neural networks.
  • Explore resources on the derivative of a function with respect to a matrix, including academic papers and textbooks.
  • Review the Wikipedia page on matrix calculus for insights and community discussions regarding common issues.
USEFUL FOR

Neural network practitioners, data scientists, and machine learning engineers who are working on model optimization and require a solid understanding of matrix differentiation techniques.

sakian
Messages
1
Reaction score
0
I'm doing some work with neural networks lately and I'm having trouble with this seemingly simple equation.

The equation describing the network is:
y = \psi(W3 x \psi(W2 x \psi(W1 x I)))​

Where:
y (scalar) is the output value​
W1 (2x2 matrix) are the 1st layer weights​
W2 (2x2 matrix) are the 2nd layer weights​
W3 (1x2 matrix) are the output layer weight​
I (2x1 vector) is the input vector​
\psi is the activation function (log sigmoid)​

I'm trying to differentiate the equation by the weight matrices (using the chain rule) but I'm getting equations that don't work. When I try to differentiate by W1 I get:

dy/dW1 = \psi' (W3 x \psi(W2 x \psi(W1 x I))) x W3 x \psi' (W2 x \psi(W1 x I)) x W2 x \psi' (W1 x I) x I

When I try to calculate I'm getting matrix dimension mismatches. Am I doing something wrong?
 
Physics news on Phys.org
I don't know the answer to your question, but I'm curious what definition you are using for the derivative of a function with respect to a matrix. Also, do you know a link that gives a rule for the derivative of a product of matrices with respect to a matrix?

I find it interesting that the current Wikipedia has a discussion page that brings up some of these issues ( http://en.wikipedia.org/wiki/Talk:Matrix_calculus ) but there is no article to go with it!
 

Similar threads

  • · Replies 10 ·
Replies
10
Views
3K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
Replies
2
Views
3K
  • · Replies 7 ·
Replies
7
Views
6K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K