View Single Post
 P: 1 I'm doing some work with neural networks lately and I'm having trouble with this seemingly simple equation. The equation describing the network is:y = $\psi$(W3 x $\psi$(W2 x $\psi$(W1 x I)))Where:y (scalar) is the output valueW1 (2x2 matrix) are the 1st layer weightsW2 (2x2 matrix) are the 2nd layer weightsW3 (1x2 matrix) are the output layer weightI (2x1 vector) is the input vector$\psi$ is the activation function (log sigmoid)I'm trying to differentiate the equation by the weight matrices (using the chain rule) but I'm getting equations that don't work. When I try to differentiate by W1 I get: dy/dW1 = $\psi$' (W3 x $\psi$(W2 x $\psi$(W1 x I))) x W3 x $\psi$' (W2 x $\psi$(W1 x I)) x W2 x $\psi$' (W1 x I) x IWhen I try to calculate I'm getting matrix dimension mismatches. Am I doing something wrong?