I'm doing some work with neural networks lately and I'm having trouble with this seemingly simple equation.

The equation describing the network is:

y = [itex]\psi[/itex](**W3** x [itex]\psi[/itex](**W2** x [itex]\psi[/itex](**W1** x **I**)))

Where:

y (scalar) is the output value

**W1** (2x2 matrix) are the 1st layer weights

**W2** (2x2 matrix) are the 2nd layer weights

**W3** (1x2 matrix) are the output layer weight

**I** (2x1 vector) is the input vector

[itex]\psi[/itex] is the activation function (log sigmoid)

I'm trying to differentiate the equation by the weight matrices (using the chain rule) but I'm getting equations that don't work. When I try to differentiate by

**W1** I get:

dy/d**W1** = [itex]\psi[/itex]' (**W3** x [itex]\psi[/itex](**W2** x [itex]\psi[/itex](**W1** x **I**))) x **W3** x [itex]\psi[/itex]' (**W2** x [itex]\psi[/itex](**W1** x **I**)) x **W2** x [itex]\psi[/itex]' (**W1** x **I**) x **I**

When I try to calculate I'm getting matrix dimension mismatches. Am I doing something wrong?