Of course, before you can define "derivative" of a function from Rn to Rm, you have to define "differentiable" (that's different from calculus I where a function is "differentiable" as long as the derivative exists!).
  If f(x) is a function from Rn to Rm, the f is differentiable at x= a if and only if there exist a linear function, L, from Rn to Rm, and a function ε(x), from Rn to Rm, such that
  f(x)= f(a)+ L(x-a)+ ε(x) and lim_{|x-a|->0}\frac{\epsilon}{|x-a|}= 0.
  If that is true, then it is easy to show that the linear function, L, is unique (ε is not). We define the "derivative of f at a" to be that linear function, L.
  Notice that, by this definition, in the case f:R1->R1, the derivative of f at a is a linear function from R['sup]1[/sup]->R1, not a number!  However, any such linear function must be of the form L(x)= ax- multiplication by a number.  That number is, of course, the "Calculus I" derivative of f.
  Similarly, the derivative of a "vector valued function of a real variable", R1->Rm, is a linear function from R1 to Rm.  Any such function can be written L(x)= x<a1, ...,am>, or x times a vector.  That vector is the vector of derivatives in the usual "calculus III" sense.
  The derivative of a "real valued function of several real variables", Rn->R1, is a linear function from Rn to R1.  Such a function can be written as a dot product: <a1,...,an> dot product the x-vector.  That vector is precisely the "gradient vector" of f. (And recall that, in Calculus III, a function may have partial derivatives at a point but not be "differentiable" there.)
  This is, by the way, where the "second derivative test" for max or min (or saddle point) of a function of two variables comes from:  You look at \frac{\partial^2F}{\partial x^2}\frac{\partial^2F}{\partial y^2}- \(\frac{\partial^2F}{\partial x \partial y}\)^2.  If that is negative at a (where the partial derivatives are 0), then there is a saddle point  at a.  If that is positive, then you have either a max or min depending on the sign of the second partials (which must be the same).
  The point is that, if F:R2->R, then its derivative, at each point, can be represented as a 2-vector (the gradient vector).  That means that the derivative function, that to each point assigns to that point the derivative vector, is a function from R2 to R2- and its derivative is a linear transformation from R2 to R2- which can be represented by a 2 by 2 matrix at each point (the "Hessian" matrix). The calculation \frac{\partial^2F}{\partial x^2}\frac{\partial^2F}{\partial y^2}- \(\frac{\partial^2F}{\partial x\partial y}\)^2 is simply the determinant of that matrix.  Since the mixed second derivatives are equal, that matrix is symmetric and can, by a coordinate change, be written as a diagonal matrix having the eigenvalues on the diagonal.  In that coordinate system, the equation for F is just ax2+ b2= C (no xy term) so if a and b are both positive we have a minimum, if both positive a maximum, if one positive, the other negative, a saddle point.  Of course, the determinant (which does not change with a change of coordinate system) is just the product ab.