Matrix dimensions are not matching after differentiation

    I'm doing some work with neural networks lately and I'm having trouble with this seemingly simple equation.

    The equation describing the network is:
    y = [itex]\psi[/itex](W3 x [itex]\psi[/itex](W2 x [itex]\psi[/itex](W1 x I)))​

    y (scalar) is the output value​
    W1 (2x2 matrix) are the 1st layer weights​
    W2 (2x2 matrix) are the 2nd layer weights​
    W3 (1x2 matrix) are the output layer weight​
    I (2x1 vector) is the input vector​
    [itex]\psi[/itex] is the activation function (log sigmoid)​

    I'm trying to differentiate the equation by the weight matrices (using the chain rule) but I'm getting equations that don't work. When I try to differentiate by W1 I get:

    dy/dW1 = [itex]\psi[/itex]' (W3 x [itex]\psi[/itex](W2 x [itex]\psi[/itex](W1 x I))) x W3 x [itex]\psi[/itex]' (W2 x [itex]\psi[/itex](W1 x I)) x W2 x [itex]\psi[/itex]' (W1 x I) x I

    When I try to calculate I'm getting matrix dimension mismatches. Am I doing something wrong?
    Stephen Tashi

    Science Advisor

    I don't know the answer to your question, but I'm curious what definition you are using for the derivative of a function with respect to a matrix. Also, do you know a link that gives a rule for the derivative of a product of matrices with respect to a matrix?

    I find it interesting that the current Wikipedia has a discussion page that brings up some of these issues ( http://en.wikipedia.org/wiki/Talk:Matrix_calculus ) but there is no article to go with it!
