you want to know the proof of the chain rule/ look: definition of the derivative of f at a, is it is a linear function L such that L(v) is tangent to f(a+v)f(a) at v= 0.
i.e. the difference quotient [f(a+v)  f(a)  L(v)]/v approaches zero as v does.
call a function o(v) such that o(v)/v goes to zero as v does "little oh", and write it o(v).
A function such that the quotient O(v)/v is bounded as v approaches zero, "big oh" and write it as O(v).
Then basic ruelks are these: linear combinations of O's are also O, and also for o's, and compositions of o's and O's are always o if even one "factor" is o. and a product of two O's is o.
Then the chain rule is as follows;
assume L is the derivative of f and M is the derivative of g, then
f(a+v))  f(a)  L(v) = o(v), so f(a+v)  f(a) = L(v) + o(v).
Hence M(f(a+v)  f(a)) =M(L(v)) + M(o(v)).
since f(a+v) = f(a) + [f(a+v)f(a)], hence we have
g(f(a+v))  g(f(a)) M([f(a+v)f(a)]) = o(f(a+v) f(a)) = o(O(v)) = o(v).
bu also M(f(a+v)  f(a)) =M(L(v)) + M(o(v)), from above,
so g(f(a+v))  g(f(a)) M([f(a+v)f(a)])
= g(f(a+v))  g(f(a))  M(L(v)) + M(o(v)) = o(v).
hence g(f(a+v))  g(f(a))  M(L(v)) = M(o(v)) + o(v) = o(v) + o(v) = o(v).
hence by definition, the derivative of g(f) at a is M(L).
i.e. the derivative of a composition is the composition, as linear maps, of the derivatives. hence as matrices it is dot product as you are computing above:
ie. dw/ds = (dw/dx,dw/dy/dw/dz).(dx/dt, dy/dt, dz/dt), and so on....
