Differentials of order 2 or bigger that are equal to 0

Adgorn · Dec 6, 2017

So I've seen in several lectures and explanations the idea that when you have an equation containing a relation between certain expressions ##x## and ##y##, if the expression ##x## approaches 0 (and ##y## is scaled down accordingly) then any power of that expression bigger than 2 (##x^n## where ##n>1##) is equal to 0, leaving only the relation between the 1st order term ##x## and ##y##.

For example in a Poisson process the chance of arrival in a time interval ##δ## where ##δ→0## , is ##λδ## (where ##λ## is the arrival frequency). The chances of no arrivals during said interval is ##1-λδ## and the chance of 2 arrivals or more is 0, because the chance of getting n arrivals in the interval ##δ## is ##(λδ)^n=λ^nδ^n## and ##δ^n=0## for ##n>1## when ##δ→0##.

Now in the basic intuitive sense I can understand why this is the case, if a variable ##x## approaches 0 then the variable ##x^2## (or ##x^n## where n>1) becomes negligibly small, and it becomes more and more negligible as ##x## becomes smaller and smaller. The thing is we are already dealing with infinitesimals in cases like the Poisson process, so why do we decide that ##x## is not negligible and ##x^2## is when both are arbitrarily small?

I guess I'm asking for a mathematical basis for this claim, I'm sure there is one since it is so confidently used in many fields in math and physics.

Thanks in advance to all the helpers.

PeroK · Dec 6, 2017

If you take a Taylor series:

##f(x) = \sum_{n= 0}^{\infty}\frac{f^{(n)}(x_0)(x-x_o)^n}{n!} = f(x_0) + (x-x_0)f'(x_0) + \frac{(x-x_0)^2 f''(x_0)}{2} + \dots##

Then, if we assume the function ##f## is well-behaved - in the sense that all its derivatives are bounded - we have:

##f(x) \approx f(x_0) + (x-x_0)f'(x_0)## (when ##x-x_0 << 1##)

You can see that there will be exceptions to this for functions where the ##n^{th}## derivatives are unbounded, but for the sort of functions normally considered in physics this is not an issue.

StoneTemplePython · Dec 6, 2017

In general, having multiple representations of the same phenomenon, can be quite helpful, so with that in mind, I wrote the below.
- - - - -

note: if you have a function that is twice differentiable, you can write it as a Taylor Polynomial with a quadratic remainder (I'd suggest using Lagrange form). Being quadratic, the remainder is ##O(n^2)##.

The linear approximation gives you the probability of one arrival. (In my view, the probability of zero arrivals is just an after thought -- it is the complement of total probability of at least one arrival). Your question really is: why is a linear approximation the best over a small enough neighborhood for a function that is (at least) twice differentiable.
- - - -
In Poisson process language: why is the linear approximation of the probability of positive arrivals (i.e. approximating it by looking at probability of only 1 arrival) arbitrarily close to the actual total probability of positive arrivals, in some small enough time neighborhood?

- - - -
Frequently fully worked examples help people a lot. So a more granular view is:

specific to the exponential function (whose re-scaled power series gives you the Poisson PMF), you may recall that one way to prove the series for the exponential function is absolutely convergent involves invoking a geometric series after a finite number of terms (to upper bound the remaining infinite series).

In the nice case of ##0 \lt \delta \lt 1##, you have

## \delta \leq \exp\big(\delta\big) -1 = \delta + \big(\frac{\delta^2}{2!} +\frac{\delta^3}{3!} + \frac{\delta^4}{4!}+ ... \big) \leq \delta +\delta^2 + \delta^3 + \delta^4 + ... = g(\delta) = \frac{\delta}{1 - \delta} ##

now consider that for small enough ##\delta##, we have ##\frac{\delta}{1 - \delta} \approx \delta##. Play around with some numbers and confirm this for yourself. E.g. what about ##\delta = \frac{1}{100,000}##? This is a small number, but hardly "infinitesimal".

If you want to have some fun with it, consider what portion of the geometric series is represented by ##\delta##. I.e. look at

## \frac{\delta}{\big(\delta + \delta^2 + \delta^3 + \delta^4 + ... \big)} = \frac{\delta }{\big(\frac{\delta}{1-\delta}\big)} = 1 - \delta##

This is why, when you look at ##\delta = \frac{1}{100,000}##,

99.999% of the value of the ##g(\delta)## is in the very first term of the series.

Taking advantage of non-negativity, we can see that since the upper bound is well approximated by ##\delta##, when ##\delta## is small enough, and since the ##\Big(\exp\big(\delta\big) -1\Big)## contains that term, it must be approximated by it as well. Put differently, we see that the value of ##\big(\frac{\delta^2}{2!} +\frac{\delta^3}{3!} + \frac{\delta^4}{4!}+ ...\big)## is irrelevant, for small enough ##\delta## .

Put one more way: you tell me what your cut off is / level of precision you want, and I can come up with a small enough real valued ##\delta ## such that you can ignore all those higher, ##O(n^2)##, terms. If you keep asking for ever finer levels of precision, this back and forth eventually results in a limit, but the idea of getting an extremely good approximation is the main idea here.

Adgorn said:

The thing is we are already dealing with infinitesimals in cases like the Poisson process, so why do we decide that ##x## is not negligible and ##x^2## is when both are arbitrarily small?

I guess I'm asking for a mathematical basis for this claim

This isn't really true. A poisson process can be thought of as a limiting case of a shrinking Bernouli process. It can also be thought of as a counting process with exponentially distributed inter-arrival times. Infinitessimals aren't needed. While the limit of a bernouli is a good interpretation, don't over think it... the counting process interpretation can be quite enlightening.

FactChecker · Dec 6, 2017

Linear approximations are so much easier to deal with than the higher order approximations, that it is worth considering using it as an estimate. It will give you the value of a function and tell you how the function changes locally in each direction. That is often enough. And the theory of linear functions (simple, simultanious, or multivariable) is reasonably deep and informative. Going one step higher to quadratic approximations opens a real can of worms.

Adgorn · Dec 17, 2017

StoneTemplePython said:

In general, having multiple representations of the same phenomenon, can be quite helpful, so with that in mind, I wrote the below.

Put in simple and general terms, given an expression containing both ##\delta## and higher orders of ##\delta##, as ##\delta## becomes arbitrarily small, the ##\delta## portion takes up more and more "weight" of the overall expression and the contribution of the higher order terms to the value of the expression become negligible, and thus can be ignored.

Also I guess I should be careful with throwing the expression "infinitesimal" around and start thinking about calculus as a whole more in terms of arbitrarily small quantities instead of infinitesimal ones.

Thanks for the help!

Stephen Tashi · Dec 22, 2017

Adgorn said:

So I've seen in several lectures and explanations the idea that when you have an equation containing a relation between certain expressions ##x## and ##y##, if the expression ##x## approaches 0 (and ##y## is scaled down accordingly) then any power of that expression bigger than 2 (##x^n## where ##n>1##) is equal to 0, leaving only the relation between the 1st order term ##x## and ##y##.

Suppose we are trying to find ##lim_{\delta \rightarrow a} f(g(\delta)) ##. If ##lim_{\delta \rightarrow a} g(\delta) = L## then for a continuous function ##f##, this limit is equal to ## f(L)##. If ##h(\delta)## is a function different than ##g## but also having ##lim_{\delta \rightarrow a} h(\delta) = L## then we could compute the same limit as ##lim_{\delta \rightarrow a} f(h(\delta))##.

You described a situation where ##a = 0## and ##g(\delta)## is some function of the functions ##x(\delta)## and ##y(\delta)##. In particular, ##g## is a polynomial in the variable ##y(\delta)## (which may or may not have coefficients involving functions of ##x(\delta)##). The argument that we can replace ##g## by a simpler function ##h(\delta)## that sets the higher powers of ##y(\delta)## to zero depends on showing that ##lim_{\delta \rightarrow 0} g(\delta) = lim_{\delta \rightarrow 0} h(\delta)##. This is not true as a generality. However, the examples you have seen in books are presumably special cases where the two limits are equal.

For example in a Poisson process the chance of arrival in a time interval ##δ## where ##δ→0## , is ##λδ## (where ##λ## is the arrival frequency). The chances of no arrivals during said interval is ##1-λδ## and the chance of 2 arrivals or more is 0, because the chance of getting n arrivals in the interval ##δ## is ##(λδ)^n=λ^nδ^n## and ##δ^n=0## for ##n>1## when ##δ→0##.

However ##\lambda \delta## also "=0" when ##\delta \rightarrow 0##, so it isn't clear what that line of reasoning shows - except that the probability of 1,2,3,.. arrivals approaches 0 as ##\delta## approaches 0. Perhaps you are quoting part of an argument that concerns the value of the intensity parameter of a process composed of two independent Poission processes that happen simultaneously.

Differentials of order 2 or bigger that are equal to 0

1. What is a differential of order 2 or bigger?

2. How can a differential of order 2 or bigger be equal to 0?

3. What is the significance of a differential of order 2 or bigger equaling 0?

4. Can a differential of order 2 or bigger equal to 0 at more than one point?

5. How is a differential of order 2 or bigger used in real-world applications?

Similar threads

Hot Threads

Recent Insights