PDA

View Full Version : matrix dimensions are not matching after differentiation


sakian
Jun21-11, 03:52 PM
I'm doing some work with neural networks lately and I'm having trouble with this seemingly simple equation.

The equation describing the network is:
y = \psi(W3 x \psi(W2 x \psi(W1 x I)))

Where:
y (scalar) is the output value
W1 (2x2 matrix) are the 1st layer weights
W2 (2x2 matrix) are the 2nd layer weights
W3 (1x2 matrix) are the output layer weight
I (2x1 vector) is the input vector
\psi is the activation function (log sigmoid)

I'm trying to differentiate the equation by the weight matrices (using the chain rule) but I'm getting equations that don't work. When I try to differentiate by W1 I get:

dy/dW1 = \psi' (W3 x \psi(W2 x \psi(W1 x I))) x W3 x \psi' (W2 x \psi(W1 x I)) x W2 x \psi' (W1 x I) x I

When I try to calculate I'm getting matrix dimension mismatches. Am I doing something wrong?

Stephen Tashi
Jun21-11, 10:26 PM
I don't know the answer to your question, but I'm curious what definition you are using for the derivative of a function with respect to a matrix. Also, do you know a link that gives a rule for the derivative of a product of matrices with respect to a matrix?

I find it interesting that the current Wikipedia has a discussion page that brings up some of these issues ( http://en.wikipedia.org/wiki/Talk:Matrix_calculus ) but there is no article to go with it!