Re: why does the gradient point in the direction of greatest change?
That depends on what you think the gradient is. You could say that is a definition, but let's take it to be a vector whose x and y coordinates are the rates of change in the x and in the y direction.
Then why does the vector with those x and y components point in the direction of greatest change?
Imagine a stone thrown in the water and the wave of water rippling out in all directions. A particle of water has a velocity vector pointing straight away from the source of the pebble. if we want to measure the component of velocity in some other direction, we just project that velocity vector in the given direction. I.e. we take the component of the vector pointing in that direction.
For example, the particle is not moving at all in the direction perpendicular to the (radially pointing) velocity vector, since that projection is zero. If we pick two arbitrary perpendicular directions, called the x and y directions, and project thje velocity vector in those directions, then the two component vectors we get are called the velocity in the x and y directions respectively, or the partial derivatives in the x and y directions
But since our space is only two dimensional, in fact we can reason backwards as well, and from those two component vectors, we can reconstruct the original velocity vector by vector addition. So it appears as if the gradient is defiend in terms of the two partials, whereas really they are only devices for recovering it.
Note that each projection of the original velocity vector is shorter than the original velocity vector. Hence the vector sum of the two x and y velocity vectors, i.e. the original velocity vector is the longest velocity vector in any possible direction.
This shows that the only velocity vector with intrinsic meaning is the longest one, the one in the direction of greatest change. the partial derivatives in the x and y directions are artificial constructs which fortunately recover the intrinsic vector, the gradient, as their vector sum.
This attempted an intuitive explanation. For a mathematical proof, you could define the partial derivatives by limits as usual in the x and y directions, and then deduce that the directional derivative in any other direction v (also defined by an approapriate limit) is obtained by dotting the direction vector v with the gradient vector formed from the two partials. It then follows that the direction in which the change is greatest is the direction vector with largest dot product with the gradient. but this is the gradient itself.
does that make any sense?