Can the Definition of Derivative be Extended to Functions from R^n to R^m?

PBRMEASAP · Aug 13, 2005

Saw this in the thread about "Explaining topology..."

matt grime said:

in R^n f is differentaible at x if there is a linaer map Df(x) satisfying

f(x+h)=f(x)+Df(x)h + ho(|h|)

this can be extended to any place where there is a notion of linear map or a map such as |?| to the reals, or some other ordered space.

Is there an easy way to extend this particular definition to when f : R^n -> R^m?

Hurkyl · Aug 13, 2005

A function whose range is R^m is simply m functions whose range is R.

PBRMEASAP · Aug 13, 2005

Yes, that makes sense. What I'm getting at is the term proportional to h at the end. If h is a vector in R^n and f(h) is a vector in R^m, how do you make that particular definition work?

Hurkyl · Aug 13, 2005

Bleh, I wasn't paying any attention at all. I'm embarassed to have written my previous post!

That definition already works for f : R^n → R^m.

Though, I guess it's much less complicated to write R(h) instead of h o(|h|), and require that |R(h)| / |h| → 0. (R for remainder)

PBRMEASAP · Aug 13, 2005

I could live with R(h) as the remainder if it is a vector in R^m. But it seems that h o(|h|) is a vector in R^n, and you can't add that to two vectors in R^m, can you?

Hurkyl · Aug 13, 2005

You're assuming the thing in the o(|h|) is a number: it could be, for example, an operator R^n → R^m whose operator norm is asymptotically less than |h|.

PBRMEASAP · Aug 13, 2005

You're right, I was assuming that. I know it doesn't really matter what the order is, but it might be more suggestive to write it o(|h|)h or something like that, to make it more clear that o(|h|) is a linear map. Thanks for the help

Hurkyl · Aug 13, 2005

It can't be a linear map. (Unless it's the zero operator)

PBRMEASAP · Aug 13, 2005

Ah, okay. Can you explain?

edit: I guess it's because, if it were linear, you could always add more vectors to the sum so that

o(|h|)(Sum of vectors) > |h| as h->0?

Hurkyl · Aug 13, 2005

Well, if o(|h|) was hiding a linear map, then the remainder would look like:

R(h) = A h

for some matrix A...

PBRMEASAP · Aug 13, 2005

Right. I guess what I had in mind was a linear map parametrized by |h|, if that makes any sense at all. But that wouldn't be a linear map since it operates differently on different choices of h. Thanks.

HallsofIvy · Aug 14, 2005

Of course, before you can define "derivative" of a function from Rⁿ to R^m, you have to define "differentiable" (that's different from calculus I where a function is "differentiable" as long as the derivative exists!).

If f(x) is a function from Rⁿ to R^m, the f is differentiable at x= a if and only if there exist a linear function, L, from Rⁿ to R^m, and a function ε(x), from Rⁿ to R^m, such that

f(x)= f(a)+ L(x-a)+ ε(x) and lim_{|x-a|->0}\frac{\epsilon}{|x-a|}= 0.

If that is true, then it is easy to show that the linear function, L, is unique (ε is not). We define the "derivative of f at a" to be that linear function, L.

Notice that, by this definition, in the case f:R¹->R¹, the derivative of f at a is a linear function from R['sup]1[/sup]->R¹, not a number! However, any such linear function must be of the form L(x)= ax- multiplication by a number. That number is, of course, the "Calculus I" derivative of f.

Similarly, the derivative of a "vector valued function of a real variable", R¹->R^m, is a linear function from R¹ to R^m. Any such function can be written L(x)= x<a₁, ...,a_m>, or x times a vector. That vector is the vector of derivatives in the usual "calculus III" sense.

The derivative of a "real valued function of several real variables", Rⁿ->R¹, is a linear function from Rⁿ to R¹. Such a function can be written as a dot product: <a₁,...,a_n> dot product the x-vector. That vector is precisely the "gradient vector" of f. (And recall that, in Calculus III, a function may have partial derivatives at a point but not be "differentiable" there.)

This is, by the way, where the "second derivative test" for max or min (or saddle point) of a function of two variables comes from: You look at \frac{\partial^2F}{\partial x^2}\frac{\partial^2F}{\partial y^2}- \(\frac{\partial^2F}{\partial x \partial y}\)^2. If that is negative at a (where the partial derivatives are 0), then there is a saddle point at a. If that is positive, then you have either a max or min depending on the sign of the second partials (which must be the same).

The point is that, if F:R²->R, then its derivative, at each point, can be represented as a 2-vector (the gradient vector). That means that the derivative function, that to each point assigns to that point the derivative vector, is a function from R² to R²- and its derivative is a linear transformation from R² to R²- which can be represented by a 2 by 2 matrix at each point (the "Hessian" matrix). The calculation \frac{\partial^2F}{\partial x^2}\frac{\partial^2F}{\partial y^2}- \(\frac{\partial^2F}{\partial x\partial y}\)^2 is simply the determinant of that matrix. Since the mixed second derivatives are equal, that matrix is symmetric and can, by a coordinate change, be written as a diagonal matrix having the eigenvalues on the diagonal. In that coordinate system, the equation for F is just ax²+ b²= C (no xy term) so if a and b are both positive we have a minimum, if both positive a maximum, if one positive, the other negative, a saddle point. Of course, the determinant (which does not change with a change of coordinate system) is just the product ab.

mathwonk · Aug 14, 2005

to define a derivative you first define what it means for a map to have derivative equal to zero. a map o(h) has derivative equal to zero at h =0 if and only if |o(h)|/|h| approaches zero as h does.

this makes sense for vector valued maps if |h| is a norm.

then the original definition makes sense (corrected) if we say that f is differentiable at x provided there exists a linear map L(h) such that the difference

f(x+h) - L(h) - f(x) has derivative equal to zero at h = 0.

then Df(x) = L.

in the original definition there is possibly an error, since the term h.o(h) should have been merely o(h) in this sense.

Hurkyl · Aug 14, 2005

So what precisely does little-oh notation mean when applied to a vector-valued function? I've only ever really used it in the context of algorithmic analysis, and time complexity isn't vector valued.

mathwonk · Aug 14, 2005

exactly what i said: |o(h)|/(h| approaches zero ( a number), as |h| does, or equivalently as h does (a vector).

Hurkyl · Aug 14, 2005

Well, |o(h)| / |h| can't approach a number because, technically, o(h) is a set.

I guess this is what I assumed, so it's good to hear confirmation:

 f \in o(g) \Leftrightarrow \lim \frac{|f|}{|g|} = 0

mathwonk · Aug 14, 2005

?

o(h) is a vector valued function of h, which is also a vector.

oh i see, my legg is being pulled..

mathwonk · Aug 14, 2005

?

o(h) is a vector valued function of h, which is also a vector.

oh i see, my legg is being pulled.. this notation was used this way by hjardy and also by loomis, among others.

Hurkyl · Aug 14, 2005

!

In computer science, the notation o(f) is the class of all functions that are asymptotically less than f. Specifically:

 o(f) := \left\{ g \, | \, \lim_{x \rightarrow \infty} \frac{g(x)}{f(x)} = 0 \right\} 

(P.S. how do I get a nice, tall |?)

I understand that here we are looking at x → 0 instead, but still...

If I were to say something like:

f(x) = x ln x + o(x)

what I really mean is that there exists some function g in o(x) such that

f(x) = x ln x + g(x)

rachmaninoff · Aug 14, 2005

Hurkyl said:

(P.S. how do I get a nice, tall |?)

Use: \left| and \right| ..... \left| \frac{\mbox{Big}}{\mbox{Expression}} \right|

Hurkyl · Aug 14, 2005

It won't complain if it's already in the middle of a left/right pair? Or that it doesn't have a matched right to go with it?

rachmaninoff · Aug 14, 2005

Nope.

\left\{ \mbox{another} \left| \frac{\mbox{big}}{\mbox{expression}} \right\}

PBRMEASAP · Aug 15, 2005

Thanks for the help, everyone. I think the definition that was given by Halls and mathwonk was probably what matt meant to write. Thanks for clearing that up.

mathwonk · Aug 15, 2005

either that, or else he meant something different by the notation o(h). I.e. it would also suffice if he meant that notation to be a function that approaches zero as h does.

such functions were called infinitesimals in the older literature.

Can the Definition of Derivative be Extended to Functions from R^n to R^m?

Similar threads

Hot Threads

Insights Fermat's Last Theorem

B How is it that law of sines does not work in this exercise?

B What could prove this wrong? I'm having a dispute with friends

B About a definition: What is the number of terms of a polynomial P(x)?

B How Many Straight Lines to Connect an N by M Array of Points in a Closed Loop?

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective