# Differentials of order 2 or bigger that are equal to 0

• I

## Main Question or Discussion Point

So I've seen in several lectures and explanations the idea that when you have an equation containing a relation between certain expressions ##x## and ##y##, if the expression ##x## approaches 0 (and ##y## is scaled down accordingly) then any power of that expression bigger than 2 (##x^n## where ##n>1##) is equal to 0, leaving only the relation between the 1st order term ##x## and ##y##.

For example in a Poisson process the chance of arrival in a time interval ##δ## where ##δ→0## , is ##λδ## (where ##λ## is the arrival frequency). The chances of no arrivals during said interval is ##1-λδ## and the chance of 2 arrivals or more is 0, because the chance of getting n arrivals in the interval ##δ## is ##(λδ)^n=λ^nδ^n## and ##δ^n=0## for ##n>1## when ##δ→0##.

Now in the basic intuitive sense I can understand why this is the case, if a variable ##x## approaches 0 then the variable ##x^2## (or ##x^n## where n>1) becomes negligibly small, and it becomes more and more negligible as ##x## becomes smaller and smaller. The thing is we are already dealing with infinitesimals in cases like the Poisson process, so why do we decide that ##x## is not negligible and ##x^2## is when both are arbitrarily small?

I guess I'm asking for a mathematical basis for this claim, I'm sure there is one since it is so confidently used in many fields in math and physics.

Thanks in advance to all the helpers.

Last edited:

PeroK
Homework Helper
Gold Member
If you take a Taylor series:

##f(x) = \sum_{n= 0}^{\infty}\frac{f^{(n)}(x_0)(x-x_o)^n}{n!} = f(x_0) + (x-x_0)f'(x_0) + \frac{(x-x_0)^2 f''(x_0)}{2} + \dots##

Then, if we assume the function ##f## is well-behaved - in the sense that all its derivatives are bounded - we have:

##f(x) \approx f(x_0) + (x-x_0)f'(x_0)## (when ##x-x_0 << 1##)

You can see that there will be exceptions to this for functions where the ##n^{th}## derivatives are unbounded, but for the sort of functions normally considered in physics this is not an issue.

Last edited:
• StoneTemplePython
Gold Member
2019 Award
In general, having multiple representations of the same phenomenon, can be quite helpful, so with that in mind, I wrote the below.
- - - - -

note: if you have a function that is twice differentiable, you can write it as a Taylor Polynomial with a quadratic remainder (I'd suggest using Lagrange form). Being quadratic, the remainder is ##O(n^2)##.

The linear approximation gives you the probability of one arrival. (In my view, the probability of zero arrivals is just an after thought -- it is the complement of total probability of at least one arrival). Your question really is: why is a linear approximation the best over a small enough neighborhood for a function that is (at least) twice differentiable.
- - - -
In Poisson process language: why is the linear approximation of the probability of positive arrivals (i.e. approximating it by looking at probability of only 1 arrival) arbitrarily close to the actual total probability of positive arrivals, in some small enough time neighborhood?

- - - -
Frequently fully worked examples help people a lot. So a more granular view is:

specific to the exponential function (whose re-scaled power series gives you the Poisson PMF), you may recall that one way to prove the series for the exponential function is absolutely convergent involves invoking a geometric series after a finite number of terms (to upper bound the remaining infinite series).

In the nice case of ##0 \lt \delta \lt 1##, you have

## \delta \leq \exp\big(\delta\big) -1 = \delta + \big(\frac{\delta^2}{2!} +\frac{\delta^3}{3!} + \frac{\delta^4}{4!}+ ... \big) \leq \delta +\delta^2 + \delta^3 + \delta^4 + ... = g(\delta) = \frac{\delta}{1 - \delta} ##

now consider that for small enough ##\delta##, we have ##\frac{\delta}{1 - \delta} \approx \delta##. Play around with some numbers and confirm this for yourself. E.g. what about ##\delta = \frac{1}{100,000}##? This is a small number, but hardly "infinitesimal".

If you want to have some fun with it, consider what portion of the geometric series is represented by ##\delta##. I.e. look at

## \frac{\delta}{\big(\delta + \delta^2 + \delta^3 + \delta^4 + ... \big)} = \frac{\delta }{\big(\frac{\delta}{1-\delta}\big)} = 1 - \delta##

This is why, when you look at ##\delta = \frac{1}{100,000}##,

99.999% of the value of the ##g(\delta)## is in the very first term of the series.

Taking advantage of non-negativity, we can see that since the upper bound is well approximated by ##\delta##, when ##\delta## is small enough, and since the ##\Big(\exp\big(\delta\big) -1\Big)## contains that term, it must be approximated by it as well. Put differently, we see that the value of ##\big(\frac{\delta^2}{2!} +\frac{\delta^3}{3!} + \frac{\delta^4}{4!}+ ...\big)## is irrelevant, for small enough ##\delta## .

Put one more way: you tell me what your cut off is / level of precision you want, and I can come up with a small enough real valued ##\delta ## such that you can ignore all those higher, ##O(n^2)##, terms. If you keep asking for ever finer levels of precision, this back and forth eventually results in a limit, but the idea of getting an extremely good approximation is the main idea here.

The thing is we are already dealing with infinitesimals in cases like the Poisson process, so why do we decide that ##x## is not negligible and ##x^2## is when both are arbitrarily small?

I guess I'm asking for a mathematical basis for this claim
This isn't really true. A poisson process can be thought of as a limiting case of a shrinking Bernouli process. It can also be thought of as a counting process with exponentially distributed inter-arrival times. Infinitessimals aren't needed. While the limit of a bernouli is a good interpretation, don't over think it... the counting process interpretation can be quite enlightening.

• FactChecker
Gold Member
Linear approximations are so much easier to deal with than the higher order approximations, that it is worth considering using it as an estimate. It will give you the value of a function and tell you how the function changes locally in each direction. That is often enough. And the theory of linear functions (simple, simultanious, or multivariable) is reasonably deep and informative. Going one step higher to quadratic approximations opens a real can of worms.

• In general, having multiple representations of the same phenomenon, can be quite helpful, so with that in mind, I wrote the below.
Put in simple and general terms, given an expression containing both ##\delta## and higher orders of ##\delta##, as ##\delta## becomes arbitrarily small, the ##\delta## portion takes up more and more "weight" of the overall expression and the contribution of the higher order terms to the value of the expression become negligible, and thus can be ignored.

Also I guess I should be careful with throwing the expression "infinitesimal" around and start thinking about calculus as a whole more in terms of arbitrarily small quantities instead of infinitesimal ones.

Thanks for the help!

• StoneTemplePython
Stephen Tashi