How to interpret the differential of a function

In elementary calculus (and often in courses beyond) we are taught that a differential of a function, ##df## quantifies an infinitesimal change in that function. However, the notion of an infinitesimal is not well-defined and is nonsensical (I mean, one cannot define it in terms of a limit, and it seems nonsensical to have a number that is smaller than any other real number - this simply doesn't exist in standard analysis). Clearly the definition $$df=\lim_{\Delta x\rightarrow 0}\Delta f =f'(x)dx$$ makes no sense, since, in the case where ##f(x)=x## we have that $$dx=\lim_{\Delta x\rightarrow 0}\Delta x =0.$$

All of this leaves me confused on how to interpret expressions such as $$df=f'(x)dx$$ Should it be seen simply as a definition, quantifying the first-order (linear) change in a function about a point ##x##? i.e. a new function that is dependent both on ##x## and a finite change in ##x##, ##\Delta x##, $$df(x,\Delta x):=f'(x)\Delta x$$ then one can interpret ##dx## as $$dx:=dx(x,\Delta x)= \Delta x$$ such that $$\Delta f=f'(x)dx+\varepsilon =df +\varepsilon$$ (in which ##\varepsilon## quantifies the error between this linear change in ##f## and the actual change in ##f##, with ##\lim_{\Delta x\rightarrow 0}\varepsilon =0##).

I feel that there must be some sort of rigorous treatment of the notion of differentials since these kind of manipulations are used all the time, at least in physics?!

I've had some exposure to differential geometry in which one has differential forms, in particular $1-$forms which suggestively notationally "look like" differentials, for example $$\omega =df$$ but as I understand it these are defined as linear maps, members of a dual space to some vector space, ##V##, which act on elements of ##V##, mapping them to real numbers. Furthermore, the basis ##1-##-forms are suggestively written as what in elementary calculus one would interpret as an infinitesimal change in x, ##dx##. But again, this is simply symbolic notation, since the basis ##1##-forms simply span the dual space and are themselves linear maps which act on elements of ##V##.

I've heard people say that differential forms make the notion of a differential of a function mathematically rigorous, however, in my mind I can't seem to reconcile how this is the case, since at best they specify the direction in which the differential change in a function occurs, via $$df(v)=v(f)$$ (since ##v(f)## is the directional derivative of a function ##f## along the vector ##v##).

If someone could enlighten me on this subject I'd really appreciate it.

Differential forms also have properties that make them natural objects to integrate over. That dovetails nicely with the definition of the exterior differential looking notationally like the Calc I differential. The exterior differntial and the integral are "dual" operations, in a sense, whose relation on manifolds with boundary is the generalized Stokes' theorem.

mathwonk
Homework Helper
2020 Award
essentially the differential df(p) of f at a point p, is the linear function whose graph is the tangent line at p to the graph of f. the differential df itself is the function whose value at p is the differential of f at p. so the differential is a function df whose values df(p) are linear functions.

essentially the differential df(p) of f at a point p, is the linear function whose graph is the tangent line at p to the graph of f. the differential df itself is the function whose value at p is the differential of f at p. so the differential is a function df whose values df(p) are linear functions.

So is it simply the linear approximation of ##f## at a point ##p##? Can one interpret ##df## as describing the change in #f# as one moves along a vector passing through the level surface of ##f## at ##p## and a level surface of ##f## infinitesimally close to ##p##? (In this sense capturing the direction in which the infinitesimal change occurs without having to deal with the ill-defined notion of an infinitesimal change in ##f##?

mathwonk
Homework Helper
2020 Award
its not the change in f, its the linear part of the change in f. i.e. it is the linear approximation to f - f(p) = delta(f). by definition a function f is differentiable at p if the difference f(p+v) - f(p) equals a linear function of v plus a function of v which is "little oh", i.e. a function o(v) such that the limit of o(v)/|v| is zero as v-->0. so o(v) not only goes to zero, but it goes to zero faster than v does. then that linear function of v is called the differential of f at p. so we have f(p+v) - f(p) = df(p)(v) + o(v), where df(p)(v) is linear in v. so the differential df(p)(v) = f - f(p) - o(v), where f - f(p) is the change in f at p,

so i guess if you took the level surfaces of f at p, and replaced them by the family of planes parallel to the one tangent to the level surface passing through p, then you would probably get the level surfaces of df(p) but I haven't thought it through carefully.

for a function of one variable, you take the infinitesimal rate of change, i.e. the derivative at p, and you multiply by it to get a linear function approximating the change in f. that linear function is the differential of f at p.

its not the change in f, its the linear part of the change in f. i.e. it is the linear approximation to f - f(p) = delta(f). by definition a function f is differentiable at p if the difference f(p+v) - f(p) equals a linear function of v plus a function of v which is "little oh", i.e. a function o(v) such that the limit of o(v)/|v| is zero as v-->0. so o(v) not only goes to zero, but it goes to zero faster than v does. then that linear function of v is called the differential of f at p. so we have f(p+v) - f(p) = df(p)(v) + o(v), where df(p)(v) is linear in v. so the differential df(p)(v) = f - f(p) - o(v), where f - f(p) is the change in f at p,

How does this tally up with the notion of a differential in calculus? Is the idea that ##df## is simply defined as the linear change in ##f## about a particular point? I find it confusing why we simply consider this part and not higher order changes in ##f##. Is it simply that (like you said) near a particular point ##p## are vanishingly small?!

mathwonk
Homework Helper
2020 Award
this is the notion of differential in calculus. with this definition, df(p) is the linear function "multiplication by f'(p)", and dx is the linear function "multiplication by 1", hence the equation df(p) = f'(p)dx(p) holds. hence the equation df = f'dx also holds everywhere. we don't consider higher order changes in f because the differential is by definition the linear part of f. I guess you could consider higher order approximations, but that is not what the word "differential" nor the symbol df, applies to. one might of course use the term "second order differential" but you have to define it.

There is no such thing as an infinitesimal...however the concept is used often in physics. It doesn't really matter that it doesn't exist. In physics, you usually just assume dx is a length that is "small enough" that an infinite sum of them can be approximated as an integral. If you truly take the limit as dx tends to 0....you can no longer apply classical mechanics anyways because at scales less than an atomic radius (a number >0)...your model doesn't even work anyways.

IMO any book that uses differentials in a serious way...without stating the caveat that they are basically an intuitive approximation, is not a very good textbook.

There is no such thing as an infinitesimal

Really?? I find that a very bold and incorrect statement.

his is the notion of differential in calculus. with this definition, df(p) is the linear function "multiplication by f'(p)", and dx is the linear function "multiplication by 1", hence the equation df(p) = f'(p)dx(p) holds. hence the equation df = f'dx also holds everywhere. we don't consider higher order changes in f because the differential is by definition the linear part of f.

Ah I see, so we simply define the differential change in a function as the linear part of the change in ##f## as we move away from a point ##p##?!
Is the reason this is significant enough to have its own definition because it represents the change in ##f## near a particular point ##p## as we move an increment ##dx## along the tangent line to the function ##p##. So ## \Delta f## and ##\Delta x## are the changes in the function and its input and ##df## is the change along the tangent line to the function at a point as we move a distance ##dx=\Delta x##. This provides a linear approximation to the actual change in the function (apparently the "best" linear approximation, although I have to admit, I'm not sure what is meant by "best" in this context)?

I think the problem is that, prior to gaining further knowledge in the area of calculus, and some (albeit a small amount) in differential geometry, I didn't really think too much about the fact that interpreting the differential of a function as an infinitesimal change in a function (as we move an infinitesimal distance ##dx## from a given point) doesn't make sense in any rigorous sense. But now I've started thinking about this more, I feel really unsure about my knowledge and understanding on the subject.
Previously I simply thought of ##df## as an infinitesimal change in ##f## and so it made sense that it was only linear since the change is infinitesimally small. Of course, if the change is finite then one has to consider higher order changes in ##f## to accurately describe the function. But now I'm left confused as to why one only considers the linear part of the change in ##f## in calculus. Particularly so in physics one seems to always talk of differential changes in quantities and doesn't worry about higher order terms. Even in some maths textbooks that I've read they introduce the notion by saying something like "the change in a function as we move a distance ##\Delta x## from a point ##x## is approximately ## \Delta f \approx f'(x)\Delta x##. We observe that as ##\Delta x## approaches zero this expression becomes exact and we have that ##df =f'(x)dx##".

Apologies to keep rambling, I realise that I might be being really thick here, but I can't seem to grasp the idea at the moment.

Last edited:
IMO any book that uses differentials in a serious way...without stating the caveat that they are basically an intuitive approximation, is not a very good textbook.

Non-standard analysis builds a framework which uses infinitesimals in a logically consistent and rigorous manner. There are books about it. One of these books is currently published in Princeton's Landmarks of Mathematics series. I'm not sure if that matters to you, but Princeton University Press doesn't add titles to that series lightly.

mathwonk
Homework Helper
2020 Award
"best" means that it is the only one which comes so close that the difference, i.e. the error term, is little oh, in the sense i defined above.

mathwonk
Homework Helper
2020 Award
an infinitesimal is technically a quantity whose length is less than every rational number. they do exist in some contexts. e.g. in non archimedean ordered fields, there are lots of elements which are smaller than every rational number in length but still not zero. so what you do is start with the rational numbers and then introduce also an infinite number of new numbers which are all smaller than 1/n for every n. these are infinitesimals, and when added to other numbers give infinitesinmally near numbers.

let me think here a sec. i believe the usual model is polynomials with rational coefficients and the associated quotient field. yes, a polynomial is called positive if its leading coefficient is positive. thus t is positive and so is 1/t. but if 1/n is any rational number, the difference 1/n - 1/t still has leading coefficient positive so 1/t is positive but smallet than every rational number, hence is infinitesimal.

so just as we enlarge the integers to introduce rationals, and also enlarge the rationals to introduce irrationals, we can enlarge the rationals to include also infinitesimals. there are models of eucliodean geometry based on non archimedean fields justa s there are ones based on the reals and other smaller fields.

or you could probably do it the opposite way and think of curves through zero as measured by their slope, so that x^2, would be smaller than every line, and x^3 smaller than x^2. this is rough, but is related to the definition of differentials, where a little oh function is considered as an infinitesimal.

"best" means that it is the only one which comes so close that the difference, i.e. the error term, is little oh, in the sense i defined above.

Ah OK, thanks for the info.

an infinitesimal is technically a quantity whose length is less than every rational number. they do exist in some contexts. e.g. in non archimedean ordered fields, there are lots of elements which are smaller than every rational number in length but still not zero. so what you do is start with the rational numbers and then introduce also an infinite number of new numbers which are all smaller than 1/n for every n. these are infinitesimals, and when added to other numbers give infinitesinmally near numbers.

let me think here a sec. i believe the usual model is polynomials with rational coefficients and the associated quotient field. yes, a polynomial is called positive if its leading coefficient is positive. thus t is positive and so is 1/t. but if 1/n is any rational number, the difference 1/n - 1/t still has leading coefficient positive so 1/t is positive but smallet than every rational number, hence is infinitesimal.

so just as we enlarge the integers to introduce rationals, and also enlarge the rationals to introduce irrationals, we can enlarge the rationals to include also infinitesimals. there are models of eucliodean geometry based on non archimedean fields justa s there are ones based on the reals and other smaller fields.

or you could probably do it the opposite way and think of curves through zero as measured by their slope, so that x^2, would be smaller than every line, and x^3 smaller than x^2. this is rough, but is related to the definition of differentials, where a little oh function is considered as an infinitesimal.

Why is the differential so significant in calculus and particularly in physics then? If it is simply defined as the first-order, linear change in a function, what is its significance? Is it simply because it is the leading order change in the function and so higher order terms are not of interest since they quickly tend to zero the closer we take ##dx## to zero?

Is the definition I gave in my first post - in terms of defining ##df## as a linear functional of ##\Delta x## a good way to think of the differential of a function at all?

What has really thrown a spanner in the works for me is that in all the teaching I've had in the past, ##df## has been treated as the exact (infinitesimal) change in ##f## due to an (infinitesimal) change in ##x##, ##dx##, but now that I have been exposed to more advanced mathematics (and more technically correct ways of thinking) it has left me confused, since now ##df## is only an approximation to a change in ##f##. I have to admit, I'm not entirely sure of the exact reasoning why my brain is protesting so much, but it is doing so nonetheless and it's really bothering me.
For example, how can expressions such as $$\Delta f =\int df =\int f'(x)dx$$ be correct? Intuitively, if I am summing (an infinite sum of) the linear parts of the change in the function, why should it equal the actual change in the function? Why doesn't one have to consider the full change in ##x##, i.e. including all higher order changes in the function?

Forgive me if I'm wrong, but I've only taken the equivalent of a first year grad course in analysis. But I was under the impression that the whole point of the development of Lebesgue measure and integration over the last two centuries was to do away with the imprecision of the Newton/Liebnitz approach to differential calculus and the Riemann Integral, and the Riemann Integral's inability to apply to a wide class of functions? What about finding the area of a square with irrational side lengths? The Riemann integral can't do that?

Also, since the set of rational numbers is dense in R, every real number is close to the set of rational numbers...so I'm not understanding your explanation of how infinitesimals exist by looking at 1/n for every integer n? The only numbers I can think of that for every integer n are less than 1/n are numbers which belong to (-inf,0] ? How can a "quantity" have a length? I thought only measure spaces could have length...

But anyways, I think we're a little off track. The OP just wants to know about differentials as they are talked about in elementary calculus and physics. I see nothing wrong with the definition of the derivative:
df/dx = lim t->x ( f(t) - f(x) ) / (t-x) at each point x in the domain of f?

What I was saying earlier is if you use this canonical definition in physics.....you get down to scales that don't make physical sense in classical mechanics, hence physicists are usually ok with just approximating it with an "infinitesimal length".

Last edited:
Forgive me if I'm wrong, but I've only taken the equivalent of a first year grad course in analysis. But I was under the impression that the whole point of the development of Lebesgue measure and integration over the last two centuries was to do away with the imprecision of the Newton/Liebnitz approach to differential calculus

Right, at the time of Cauchy, Riemann, Lebesgue, the infinitesimal approach was highly nonrigorous. Right now, it has been made rigorous. The construction of infinitesimals is well-known and rigorous.

and the Riemann Integral, and the Riemann Integral's inability to apply to a wide class of functions?

That is not the point of the Lebesgue integral. We did not invent the Lebesgue integral just because it can integrate more functions. We adopt the Lebesgue integral because it has better properties, like interchanging limit and integral, and like having ##L^1## complete.

What about finding the area of a square with irrational side lengths? The Riemann integral can't do that?

I'm not sure what you mean. The Riemann integral can handle that perrfectly.

Also, since the set of rational numbers is dense in R, every real number is close to the set of rational numbers...so I'm not understanding your explanation of how infinitesimals exist by looking at 1/n for every integer n?

You are assuming that infinitesimals would be real numbers, they're not.

The only numbers I can think of that for every integer n are less than 1/n are numbers which belong to (-inf,0] ? How can a "quantity" have a length? I thought only measure spaces could have length...

Length spaces https://people.math.ethz.ch/~lang/LengthSpaces.pdf

Stephen Tashi
Right now, it has been made rigorous. The construction of infinitesimals is well-known and rigorous.

However, if a text wishes to use the infinitesimals in that rigorously defined way, the text usually will make that clear. Most texts dealing with applications of mathematics don't intend to deal with the rigorous approach to infinitesimals. So, for many many texts, it is correct to say that their use of infinitesimals is only an intuitive form of reasoning.

mathwonk
Homework Helper
2020 Award
"Why is the differential so significant in calculus and particularly in physics then? If it is simply defined as the first-order, linear change in a function, what is its significance? Is it simply because it is the leading order change in the function and so higher order terms are not of interest since they quickly tend to zero the closer we take dx to zero?"

analyzing any phenomenon in its entirety is incredibly difficult, essentially impossible. Hence to make any progress we try to find reasonable approximations that have two virtues: 1) they are actually computable, 2) the result of the computation gives us useful information about the original situation.

The derivative, or differential, or best linear approximation, is such a compromise. It is often easy to compute the linear approximation to a function, and that approximation tells us something useful, except sometimes when it is zero. E.g. if f is differentiable at p and f'(p) > 0, then on some, possibly very tiny, neighborhood of p, f is smaller to the left of p and larger to the right of p. Another e.g. is that if f is continuously differentiable at p and f'(p) ≠ 0, then f is actually smoothly invertible on some, again possibly quite small, interval around p.

With more effort and more information about the bounds on the derivative, we may be able to say something about the sizes of those neighborhoods where these approximations are useful. One global result is that if a smooth function has a derivative which is never zero on a given interval, even a large one, then that function is monotone on that entire interval, hence invertible.

Most useful mathematical inventions are simpliications of the actual phenomena at hand, obtained by intelligently throwing away some of the data, and yet being able to make useful conclusions from what is left. So yes, higher order inofmation is important, but it is difficult to analyze, and so we try to identify phenomena that do not change when we only make higher order changes, and the differential lets us analyze those first order phenomena.

Then with more work and finer tools we may try later to analyze also higher order phenomena. E.g. curvature can be analyzed using second derivatives.

And If all the first n derivatives vanish at p, we can also tell something about the behavior of f near p from knowing that the n+1st derivative does not vanish there.

homology and homotopy groups in topology cannot tell the difference between a point and a line, or three space, since all these have zero homology and homotopy. but if we are clever we can remove a single point and then these groups do distionguish these spaces, so the spaces were also different when the points are replaced. I.e. those groups do distinguish spheres of diffrent dimensions and bt removing a ingle point we change euclidean space into something rather like a sphere (i.e. a cylinder over a sphere).

mathwonk
Homework Helper
2020 Award
@ hercuflea:
"so just as we enlarge the integers to introduce rationals, and also enlarge the rationals to introduce irrationals, we can enlarge the rationals [or the reals] to include also infinitesimals. there are models of euclidean geometry based on non archimedean fields just as there are ones based on the reals and other smaller fields."

i was trying to show that the reason we don't think infinitesimals exist is because (as micromass said) we are used to thinking only within the reals, where they do not. but they are easily added in, just as we have added in needed numbers of other types in many other settings.

think of graphs passing through (0,0) and say one is smaller than the other if on some nbd of (0,0) it is smaller. then you have one line for each real slope, but you also have all the monomial graphs y = x^n, for n > 1, which are all smaller, on some nbhd of (0,0), than all lines of all positive slopes.

the tangent line at (0,0) to a given graph is the unique line such that it differs from the given graph by one of these infinitesimally small curves. so theory of infinmitesimals is in some sense, as you are probably also saying, just the attempt to make precise the meaning of a graph that is tangent to the x axis at (0,0), i.e. that is "little oh".

"Why is the differential so significant in calculus and particularly in physics then? If it is simply defined as the first-order, linear change in a function, what is its significance? Is it simply because it is the leading order change in the function and so higher order terms are not of interest since they quickly tend to zero the closer we take dx to zero?"

analyzing any phenomenon in its entirety is incredibly difficult, essentially impossible. Hence to make any progress we try to find reasonable approximations that have two virtues: 1) they are actually computable, 2) the result of the computation gives us useful information about the original situation.

The derivative, or differential, or best linear approximation, is such a compromise. It is often easy to compute the linear approximation to a function, and that approximation tells us something useful, except sometimes when it is zero. E.g. if f is differentiable at p and f'(p) > 0, then on some, possibly very tiny, neighborhood of p, f is smaller to the left of p and larger to the right of p. Another e.g. is that if f is continuously differentiable at p and f'(p) ≠ 0, then f is actually smoothly invertible on some, again possibly quite small, interval around p.

With more effort and more information about the bounds on the derivative, we may be able to say something about the sizes of those neighborhoods where these approximations are useful. One global result is that if a smooth function has a derivative which is never zero on a given interval, even a large one, then that function is monotone on that entire interval, hence invertible.

Most useful mathematical inventions are simpliications of the actual phenomena at hand, obtained by intelligently throwing away some of the data, and yet being able to make useful conclusions from what is left. So yes, higher order inofmation is important, but it is difficult to analyze, and so we try to identify phenomena that do not change when we only make higher order changes, and the differential lets us analyze those first order phenomena.

Then with more work and finer tools we may try later to analyze also higher order phenomena. E.g. curvature can be analyzed using second derivatives.

And If all the first n derivatives vanish at p, we can also tell something about the behavior of f near p from knowing that the n+1st derivative does not vanish there.

homology and homotopy groups in topology cannot tell the difference between a point and a line, or three space, since all these have zero homology and homotopy. but if we are clever we can remove a single point and then these groups do distionguish these spaces, so the spaces were also different when the points are replaced. I.e. those groups do distinguish spheres of diffrent dimensions and bt removing a ingle point we change euclidean space into something rather like a sphere (i.e. a cylinder over a sphere).

Is the point that ##df=f'(x)dx## is, by definition, the change in the linear function describing the tangent line to the curve, described by the function ##f##, to the point ##x##. Thus ##df## is itself another function that, at points near ##x##, is approximately equal to the actual change in ##f##?! For example,in physics one has Hooke's law which describes the restoring force of a spring. This is a linear approximation to the full force, but in many applications it is accurate enough to, in practice, fully describe the force, so one simply defines a force function ##F=-kx## which the linear approximation to the actual restoring force. Would this be correct at all?

I'm still not quite sure why one can get away with ##\int f'(x) dx =\int df##? Is it simply an application of the chain rule and the fundamental theorem of calculus, or is there something else going on?

In the context of differential geometry, is it correct to say that ##df## is a linear functional that maps vectors to real numbers, and that ##df(\mathbf{v})## quantifies the change in ##f## as one moves an infinitesimal amount along the direction of a given vector ##\mathbf{v}##? Is such a construction useful because it gives a rigorous construction of differentials in terms of differential forms and also enables one to consider integrals of functions in a coordinate independent manner?

Kindly if anyone could explain me the limit part of the equation... I know that thing means limit of delta x is zero as delta x tends to zero.. or in other words the change in x is infinitesimally small (approaching zero) .. and as you too told there is nothing such as infinitesimal..but I have seen at many places things like limit of f(x) as delta x approaches 6 is equal to zero... now what does that mean??? Please explain me thoroughly..

Mark44
Mentor
Kindly if anyone could explain me the limit part of the equation...
Which equation? There have been several equations shown in this thread. You need to be more specific on what you're asking about.

Mark_Boy said:
I know that thing means limit of delta x is zero as delta x tends to zero..
Mark_Boy said:
or in other words the change in x is infinitesimally small (approaching zero) .. and as you too told there is nothing such as infinitesimal..but I have seen at many places things like limit of f(x) as delta x approaches 6 is equal to zero... now what does that mean???
As I interpret what you wrote, it doesn't mean anything.
##\lim_{\Delta x \to 6}f(x) = 0##
This is meaningless because ##\Delta x## is changing, but f(x) doesn't have anything to do with ##\Delta x##.
Mark_Boy said:

Agreeing with most of Frank Castle's post above, and saying why:

Although it has been shown that it is possible to put infinitesimals on a sound rigorous basis in mathematics, they are not part of our everyday experience — at least not mine — and so they are probably not the best way to think of "the differential of a function".

Consider the symbol df, where

f: (a, b) →

is a differentiable function. Then when we say

df = f'(x) dx

we are saying a separate (but similar) thing about each x in (a, b). So let c ∈ (a, b) be one such x. For x = c we are saying that

For any x near x = c, the change in f(x) is approximately f'(c) multiplied by the change in x from x = c,

(as the best linear approximation) — and such that the approximation approaches exactness as x approaches c.

Because dx and df mean separate things at different x-values, we will use a subscript to remind us of this:

In fact, think of "dx" at the point x = c as meaning the change in x (from c):

dxc(x) = x - c

and likewise, think of "df" at the point x = c as meaning the best linear approximation near c to the change in f (from f(c)):

dfc(x) = f'(c) (x - c) = f'(c) dxc(x).

In fact, precisely because dx and df mean separate things at each value of x, in higher math they are often denoted as functions of x = c as well as functions of x:

dxc(x) = dx(c)(x)

and

dfc(x) = df(c)(x).

(And no, this was not done for the express purpose of confusing you! It was done to formally separate the various dxc's and dfc's for different values of c, so that you will always bear in mind that they are not the same thing.)

Let U be an open set in
n. Then the analogous thing applies in higher dimensions to functions

g: U →

where now dg is a function of the point x = c ∈ U ⊂ n, as well as the tangent vector v to n at the point x = c:

dgc(v) = (∇g)cv,

which is the best linear approximation to g near the point x = c, as then applied to the vector v (which may also be thought of as v = x - c).