View Single Post
sakian
#1
Jun21-11, 03:52 PM
P: 1
I'm doing some work with neural networks lately and I'm having trouble with this seemingly simple equation.

The equation describing the network is:
y = [itex]\psi[/itex](W3 x [itex]\psi[/itex](W2 x [itex]\psi[/itex](W1 x I)))
Where:
y (scalar) is the output value
W1 (2x2 matrix) are the 1st layer weights
W2 (2x2 matrix) are the 2nd layer weights
W3 (1x2 matrix) are the output layer weight
I (2x1 vector) is the input vector
[itex]\psi[/itex] is the activation function (log sigmoid)
I'm trying to differentiate the equation by the weight matrices (using the chain rule) but I'm getting equations that don't work. When I try to differentiate by W1 I get:
dy/dW1 = [itex]\psi[/itex]' (W3 x [itex]\psi[/itex](W2 x [itex]\psi[/itex](W1 x I))) x W3 x [itex]\psi[/itex]' (W2 x [itex]\psi[/itex](W1 x I)) x W2 x [itex]\psi[/itex]' (W1 x I) x I
When I try to calculate I'm getting matrix dimension mismatches. Am I doing something wrong?
Phys.Org News Partner Science news on Phys.org
New model helps explain how provisions promote or reduce wildlife disease
Stress can make hard-working mongooses less likely to help in the future
Grammatical habits in written English reveal linguistic features of non-native speakers' languages