sakian
- 1
- 0
I'm doing some work with neural networks lately and I'm having trouble with this seemingly simple equation.
The equation describing the network is:
Where:
I'm trying to differentiate the equation by the weight matrices (using the chain rule) but I'm getting equations that don't work. When I try to differentiate by W1 I get:
When I try to calculate I'm getting matrix dimension mismatches. Am I doing something wrong?
The equation describing the network is:
y = \psi(W3 x \psi(W2 x \psi(W1 x I)))
Where:
y (scalar) is the output value
W1 (2x2 matrix) are the 1st layer weights
W2 (2x2 matrix) are the 2nd layer weights
W3 (1x2 matrix) are the output layer weight
I (2x1 vector) is the input vector
\psi is the activation function (log sigmoid)
I'm trying to differentiate the equation by the weight matrices (using the chain rule) but I'm getting equations that don't work. When I try to differentiate by W1 I get:
dy/dW1 = \psi' (W3 x \psi(W2 x \psi(W1 x I))) x W3 x \psi' (W2 x \psi(W1 x I)) x W2 x \psi' (W1 x I) x I
When I try to calculate I'm getting matrix dimension mismatches. Am I doing something wrong?