It all comes from the formula <v,w>=|v||w|cosθ where θ is the angle between v and w.
Let F:
R³-->
R be a function. By looking at the definition of DF(x)v for |v|=1 (the derivative of F at a point x in the direction of the vector v), we agree that this number represents the rate of change of F at x in the direction v. On the other hand, the
gradient of F at x is defined as the (unique) vector ∇F(x) such that DF(x)v=<∇F(x),v> for all vectors v.
So to ask in which direction is F increasing the most rapidly at x is to ask which vector v of unit lenght (|v|=1) maximizes the value of DF(x)v. But DF(x)v=<∇F(x),v>=|∇F(x)||v|cosθ=|∇F(x)|cosθ, with cosθ taking values between -1 and 1. Clearly, |∇F(x)|cosθ is largest when cosθ=1; i.e. when θ=0. That is, when v points in the direction of ∇F(x)!
Now, consider S a surface in
R³ that is realized as the level set F=c of F. That is,

for some constant c. Take x a point in S. By definition, a tangent vector to S at x is a vector v of the form

for some curve
![LaTeX Code: \\gamma:]-1,1[\\rightarrow S](latex_images/26/2617326-2.png)
on S with

. Notice that for

a tangent vector to S at x, the derivative of F at x in the direction v vanishes:
The second equality is the
chain rule and the third equality is because the map

is the map

.
Ok, so in terms of the gradient, what does this tells us? It tells us that 0=DF(x)v=<∇F(x),v>=|∇F(x)||v|cosθ, so cosθ=0, so θ=±90°. That is, ∇F(x) and v are perpendicular. By definition, this means ∇F(x) is perpendicular (or
normal) to the surface S at the point x.