Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Matrix dimensions are not matching after differentiation

  1. Jun 21, 2011 #1
    I'm doing some work with neural networks lately and I'm having trouble with this seemingly simple equation.

    The equation describing the network is:
    y = [itex]\psi[/itex](W3 x [itex]\psi[/itex](W2 x [itex]\psi[/itex](W1 x I)))​

    y (scalar) is the output value​
    W1 (2x2 matrix) are the 1st layer weights​
    W2 (2x2 matrix) are the 2nd layer weights​
    W3 (1x2 matrix) are the output layer weight​
    I (2x1 vector) is the input vector​
    [itex]\psi[/itex] is the activation function (log sigmoid)​

    I'm trying to differentiate the equation by the weight matrices (using the chain rule) but I'm getting equations that don't work. When I try to differentiate by W1 I get:

    dy/dW1 = [itex]\psi[/itex]' (W3 x [itex]\psi[/itex](W2 x [itex]\psi[/itex](W1 x I))) x W3 x [itex]\psi[/itex]' (W2 x [itex]\psi[/itex](W1 x I)) x W2 x [itex]\psi[/itex]' (W1 x I) x I

    When I try to calculate I'm getting matrix dimension mismatches. Am I doing something wrong?
  2. jcsd
  3. Jun 21, 2011 #2

    Stephen Tashi

    User Avatar
    Science Advisor

    I don't know the answer to your question, but I'm curious what definition you are using for the derivative of a function with respect to a matrix. Also, do you know a link that gives a rule for the derivative of a product of matrices with respect to a matrix?

    I find it interesting that the current Wikipedia has a discussion page that brings up some of these issues ( http://en.wikipedia.org/wiki/Talk:Matrix_calculus ) but there is no article to go with it!
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook