Yes, that makes sense. What I'm getting at is the term proportional to h at the end. If h is a vector in R^n and f(h) is a vector in R^m, how do you make that particular definition work?

I could live with R(h) as the remainder if it is a vector in R^m. But it seems that h o(|h|) is a vector in R^n, and you can't add that to two vectors in R^m, can you?

You're assuming the thing in the o(|h|) is a number: it could be, for example, an operator R^n → R^m whose operator norm is asymptotically less than |h|.

You're right, I was assuming that. I know it doesn't really matter what the order is, but it might be more suggestive to write it o(|h|)h or something like that, to make it more clear that o(|h|) is a linear map. Thanks for the help

Right. I guess what I had in mind was a linear map parametrized by |h|, if that makes any sense at all. But that wouldn't be a linear map since it operates differently on different choices of h. Thanks.

Of course, before you can define "derivative" of a function from R^{n} to R^{m}, you have to define "differentiable" (that's different from calculus I where a function is "differentiable" as long as the derivative exists!).

If f(x) is a function from R^{n} to R^{m}, the f is differentiable at x= a if and only if there exist a linear function, L, from R^{n} to R^{m}, and a function ε(x), from R^{n} to R^{m}, such that

f(x)= f(a)+ L(x-a)+ ε(x) and [tex]lim_{|x-a|->0}\frac{\epsilon}{|x-a|}= 0[/tex].

If that is true, then it is easy to show that the linear function, L, is unique (ε is not). We define the "derivative of f at a" to be that linear function, L.

Notice that, by this definition, in the case f:R^{1}->R^{1}, the derivative of f at a is a linear function from R['sup]1[/sup]->R^{1}, not a number! However, any such linear function must be of the form L(x)= ax- multiplication by a number. That number is, of course, the "Calculus I" derivative of f.

Similarly, the derivative of a "vector valued function of a real variable", R^{1}->R^{m}, is a linear function from R^{1} to R^{m}. Any such function can be written L(x)= x<a_{1}, ....,a_{m}>, or x times a vector. That vector is the vector of derivatives in the usual "calculus III" sense.

The derivative of a "real valued function of several real variables", R^{n}->R^{1}, is a linear function from R^{n} to R^{1}. Such a function can be written as a dot product: <a_{1},...,a_{n}> dot product the x-vector. That vector is precisely the "gradient vector" of f. (And recall that, in Calculus III, a function may have partial derivatives at a point but not be "differentiable" there.)

This is, by the way, where the "second derivative test" for max or min (or saddle point) of a function of two variables comes from: You look at [itex]\frac{\partial^2F}{\partial x^2}\frac{\partial^2F}{\partial y^2}- \(\frac{\partial^2F}{\partial x \partial y}\)^2[/itex]. If that is negative at a (where the partial derivatives are 0), then there is a saddle point at a. If that is positive, then you have either a max or min depending on the sign of the second partials (which must be the same).

The point is that, if F:R^{2}->R, then its derivative, at each point, can be represented as a 2-vector (the gradient vector). That means that the derivative function, that to each point assigns to that point the derivative vector, is a function from R^{2} to R^{2}- and its derivative is a linear transformation from R^{2} to R^{2}- which can be represented by a 2 by 2 matrix at each point (the "Hessian" matrix). The calculation [itex]\frac{\partial^2F}{\partial x^2}\frac{\partial^2F}{\partial y^2}- \(\frac{\partial^2F}{\partial x\partial y}\)^2[/itex] is simply the determinant of that matrix. Since the mixed second derivatives are equal, that matrix is symmetric and can, by a coordinate change, be written as a diagonal matrix having the eigenvalues on the diagonal. In that coordinate system, the equation for F is just ax^{2}+ b^{2}= C (no xy term) so if a and b are both positive we have a minimum, if both positive a maximum, if one positive, the other negative, a saddle point. Of course, the determinant (which does not change with a change of coordinate system) is just the product ab.

to define a derivative you first define what it means for a map to have derivative equal to zero. a map o(h) has derivative equal to zero at h =0 if and only if |o(h)|/|h| approaches zero as h does.

this makes sense for vector valued maps if |h| is a norm.

then the original definition makes sense (corrected) if we say that f is differentiable at x provided there exists a linear map L(h) such that the difference

f(x+h) - L(h) - f(x) has derivative equal to zero at h = 0.

then Df(x) = L.

in the original definition there is possibly an error, since the term h.o(h) should have been merely o(h) in this sense.

So what precisely does little-oh notation mean when applied to a vector-valued function? I've only ever really used it in the context of algorithmic analysis, and time complexity isn't vector valued.