Can anyone justify this derivation of Stirling’s approximation?

In summary: I can’t see how this expectation is justified. I don’t know how to see how E(x) behaves as x becomes large. Can you demonstrate that (if it is true)?In summary, the famous Stirling's approximation states that ##N! \approx \sqrt{2\pi N}(N/e)^N##, which becomes more accurate for larger values of N. The derivation of this formula can be found online, but there is one detail that is bothersome. The approximation used in the derivation, ##N\ln x-x \approx N\ln N - N - \frac{(x-N)^2}{2N}##, is a truncated Taylor expansion around the maximum point at x = N. This
  • #1
Hiero
322
68
The famous Stirling’s approximation is ##N! \approx \sqrt{2\pi N}(N/e)^N## which becomes more accurate for larger N. (Although it’s surprisingly accurate for small values!)

I have found a nice derivation of the formula, but there is one detail which bothers me. The derivation can be found here:
https://www.physics.harvard.edu/uploads/files/undergrad/probweek/sol56.pdf

In equation 3, we make the following approximation:

##N\ln x-x \approx N\ln N -N-\frac{(x-N)^2}{2N}##

Which is the truncated Taylor expansion of ##N\ln x-x## about it’s maximum at x = N.

I can’t seem to understand why this approximation should work. Truncated Taylor polynomials are typically used when the variation (x-N) is small. That is the condition for better accuracy. However in this situation, x is going to vary from zero (which is far from N for large N) all the way up to infinity! That is the bounds of the factorial integral.

So why should this approximation be accurate? How can we be confident beforehand that it should give accurate results?

This is an otherwise wonderful derivation! I really love it! But that one detail does not sit right with me. I would have never found this derivation, because even if it crossed my mind, I would have dismissed the accuracy of that Taylor approximation over the range that we use it.

Any insight (as to why the Taylor quadratic is accurate despite integrating it infinitely far from the point of expansion) will be much appreciated!
 
Physics news on Phys.org
  • #2
##-N-\dfrac{(x-N)^2}{2N}=\dfrac{-2N^2-x^2-N^2+2xN}{2N}\approx\dfrac{-2N^2-N^2-N^2+2N^2}{2N}= -N## for ##x\approx N##
 
  • Like
Likes romsofia
  • #3
fresh_42 said:
##-N-\dfrac{(x-N)^2}{2N}=\frac{-2N^2-x^2-N^2+2xN}{2N}\approx\frac{-2N^2-N^2-N^2+2N^2}{2N}= -N## for ##x\approx N##
Why not just say
##\frac{(x-N)^2}{2N}\approx \frac{(N-N)^2}{2N}=0##
for ##x\approx N##

Anyway, I don’t see your point. I already said in the OP that the approximation will be accurate when x-N is small (i.e. for ##x\approx N##)

The problem is that we are using this approximation inside the integral ##N! = \int^{∞}_0e^{N\ln x -x}dx## i.e. we’re using that approximation for infinitely large x.
 
  • #4
Here’s my thinking:

For large x, ##N\ln x - x## looks like -x+C (which can be seen from the fact that the derivative tends to -1).

Also, our Taylor approximation is an upside down parabola (coinciding at the maximum).

These two facts imply that the quadratic approximation will diverge below the true function. In other words, if we write,

##N \ln x - x = N\ln N - N - \frac{(x-N)^2}{2N} + E(x)##

Where E(x) is some error function, then we should expect lim,x→∞[E(x)] = ∞

If this limit was instead -∞ I would be somewhat satisfied, because then we would have:

##\exp (N \ln x - x)= \exp (N\ln N - N - \frac{(x-N)^2}{2N} + E(x)) = \exp (N\ln N - N - \frac{(x-N)^2}{2N})⋅\exp (E(x))##

Where this last factor ##\exp (E(x))## would tend towards 1 if the error function went to -∞

But it seems to me the error goes to +∞

So now I’m extra bothered :mad:?:):confused::sorry:

Good night.
 
  • #5
Hiero said:
Why not just say
##\frac{(x-N)^2}{2N}\approx \frac{(N-N)^2}{2N}=0##
for ##x\approx N##

Anyway, I don’t see your point. I already said in the OP that the approximation will be accurate when x-N is small (i.e. for ##x\approx N##)

The problem is that we are using this approximation inside the integral ##N! = \int^{∞}_0e^{N\ln x -x}dx## i.e. we’re using that approximation for infinitely large x.

If you look at the full integrand, it's ##N! = \int e^{x \ln(N) - x} dx##. Because ##e^{-x}## goes to zero so rapidly when ##x## is large, we don't need to care about what ##N \ln x## when ##x## is very large, since it makes a tiny contribution to the integral.

You could think of it this way: Letting ##f(x) = x\ln(N) - x##, you're trying to approximate ##\int_0^\infty e^{f(x)} dx##. So you split up the integral this way:

##I_1 = \int_0^{N-L} e^{f(x)} dx##
##I_2 = \int_{N-L}^{N+L} e^{f(x)} dx##
##I_3 = \int_{N+L}^\infty e^{f(x)} dx##

where ##L## is chosen so that the approximation ##f(x) = A + B(x-N) + C(x-N)^2## (the first few terms of the Taylor expansion about ##x=N##) is approximately valid in the region ##N-L < x < N+L## . So for ##I_2##, replacing ##f(x)## by the approximation is accurate.

For the integrals ##I_1## and ##I_3##, the approximation is NOT good, but we argue (or assume) that those two integrals are small compared with ##I_2##.

So we have the approximation ##N! \approx \int_{N-L}^{N+L} e^{f(x)} dx \approx \int_{N-L}^{N+L} e^{f_0(x)} dx##. where ##f_0(x)## is the approximation given by the Taylor expansion. Then we argue that since ##e^{f_0(x)}## is very small when ##|x-N| \gg L##, there is negligible harm in replacing the range of the integral by ##-\infty, +\infty##.

So it's not that the Taylor series is valid in the whole range ##-\infty, +\infty##, it's just that there is a negligible contribution to the integral in the region where it's not valid.
 
Last edited:
  • Like
Likes Hiero and Delta2
  • #7
Using the graph from @atyy's link, the quadratic approximation for ##\phi(x) = N\ln x - x## when ##N=100## is very accurate in the region ##75 < x < 125##. It's pretty bad outside that region. But ##e^\phi## is effectively zero outside that range, anyway, so it doesn't matter how bad the approximation is.
 
  • #8
stevendaryl said:
If you look at the full integrand, it's ##N! = \int e^{N\ln x - x} dx ##Because ##e^{-x}## goes to zero so rapidly when ##x## is large, we don't need to care about what ##N \ln x## when ##x## is very large, since it makes a tiny contribution to the integral.
I understand everything you’re trying to say. I figured the key is that we’re exponentiating. I agree that the decay of e^-x outpaces the growth of x^N for large x. (Hence the infinite side of the unapproximated integral converges.) But I feel like that isn’t what we need to show...

Can you address what I’ve said in post #4?
Hiero said:
In other words, if we write,

##N \ln x - x = N\ln N - N - \frac{(x-N)^2}{2N} + E(x)##

Where E(x) is some error function, then we should expect lim,x→∞[E(x)] = ∞

If this limit was instead -∞ I would be somewhat satisfied, because then we would have:

##\exp (N \ln x - x)= \exp (N\ln N - N - \frac{(x-N)^2}{2N} + E(x)) = \exp (N\ln N - N - \frac{(x-N)^2}{2N})⋅\exp (E(x))##

Where this last factor ##\exp (E(x))## would tend towards 1 if the error function went to -∞

But it seems to me the error goes to +∞

The graph linked by @atyy shows what I was saying, that we must add a positively infinite error to the approximation to get the true value.

I feel we need to show something like ##E(x)<< \frac{(x-N)^2}{2N}## (for larger x) so that when we exponentiate the difference of those terms the latter dominates. It’s not obvious to me though.

I know that ultimately you’re right, that even with this error factor going to positive infinity the tail end of the integral still contributes negligibly. I know that’s true because it gives Stirling’s approximation! But if we couldn’t use the ends to justify the means, then I would not be convinced.

Anyway I do like your argument about splitting up the integral! I guess I should just accept that reasoning and call it a day /:

One thing to note:
stevendaryl said:
So we have the approximation ##N! \approx \int_{N-L}^{N+L} e^{f(x)} dx \approx \int_{N-L}^{N+L} e^{f_0(x)} dx##
We also need to say that ##\int_{N-L}^{N+L} e^{f_0(x)} dx \approx \int_{-∞}^{∞} e^{f_0(x)} dx##
But anyway it’s a Gaussian (bell curve) integral with mean at x=N so that last step makes sense.

...

Anyway thanks for your time! I hope you have something to say about my post #4 but if not then I’ll just accept your hand-waving :-p
 
  • #9
Hiero said:
I feel we need to show something like ##E(x)<< \frac{(x-N)^2}{2N}## (for larger x) so that when we exponentiate the difference of those terms the latter dominates.

No, you don't need to show that, and it's not true. That's what I tried to say. You have three regions of interest:
  1. ##x \ll N##
  2. ##x \approx N##
  3. ##x \gg N##
It's only in region 2 that ##E(x) \ll \frac{(x-N)^2}{2N}##. It's not true for large x. But it's only in region 2 where the integral is non-negligible.
 
  • #10
stevendaryl said:
No, you don't need to show that, and it's not true.
At the very least we can say ##E(x) < (x-N)^2/(2N)## for large x (the double << is vague in my mind anyway). Surely you don’t disagree with that?

Because we get Stirling’s approximation from the following integral (by leaving out E(x) and replacing 0 with -∞)
$$N! = \Bigg(\frac{N}{e}\Bigg)^N\int_0^{\infty} \exp\Bigg(E(x)-\frac{(x-N)^2}{2N}\Bigg)dx$$
This integral (with E(x) in there and 0 as the lowerbound) is precise not approximate.

If E(x) > (x-N)^2/(2N) for large x, then clearly that integral would diverge.
 
  • #11
Hiero said:
At the very least we can say ##E(x) < (x-N)^2/(2N)## for large x (the double << is vague in my mind anyway). Surely you don’t disagree with that?

That's exactly what I was disagreeing with. It's not true.
 
  • #12
stevendaryl said:
That's exactly what I was disagreeing with. It's not true.
Well then... with all due respect... you’re wrong!

If ##E(x) ≥ (x-N)^2/(2N)## for large x then ##E(x) - (x-N)^2/(2N) ≥ 0## for large x and ##\int_0^{\infty} e^{E(x)-(x-N)^2/(2N)}dx## would clearly diverge (because the integrand would not go to zero). But it doesn’t diverge, because the integral in post #10 is identically equal to N factorial. It’s a funny way of writing the gamma function integral.

Perhaps you are imagining E(x) will out grow (x-N)^2 because the leading term of E(x) is proportional to (x-N)^3 ?

But if we think about all the terms in E(x), there will be an alternating pattern of positive and negative coefficients. The net sum needs to give ##E(x) < (x-N)^2/(2N)## for large x (though I suspect it’s true for all x if we include equality at x=N).
 
  • #13
Hiero said:
Well then... with all due respect... you’re wrong!

Let me go through this one more time, and see if you can agree with all the steps.
  1. ##N!## is ##\int_0^ \infty x^N e^{-x} dx##
  2. Define ##I(L)## be ##\int_{N-L}^{N+L} x^N e^{-x} dx##.
  3. Define ##I_0(L)## be ##N^N e^{-N} \int_{N-L}^{N+L} e^{- \frac{(x-N)^2}{2N}} dx##
The claim being made is that for some value of ##L## that is large, but still small compared with ##N##,

##N! \approx I(L) \approx I_0(L) \approx I_0(\infty)##

We don't need for the approximation ##N\ln x -x \approx N\ln N -N - \frac{(x-N)^2}{2N}## to be a good approximation when ##x## is large. You only need for it to be a good approximation in the range ##N-L < x < N+L##.
 
  • #14
stevendaryl said:
Let me go through this one more time,
I’ve understood your point the first two times. It’s a nice wave of the hands, but it does not align with my understanding of the use of the ##\approx## symbol. My understanding is, whenever this symbol is used mathematically, (engineering would be different,) there is some condition such that the error of the approximation becomes arbitrarily small. In other words for every “##\approx##” there is an associated “as x → y”

When you say “there is some value of L such that ##N! \approx I(L)##,” then I consider this nonsense. Is 1.00001 ≈ 1? Maybe, or maybe not. Is 1,000,000 ≈ 1? In some contexts, yes.

In my eyes it’s only meaningful to say something like “ ##e^x\approx 1## for small x ” because this translates into “ ##\lim _{x→0}[e^x]=1## ”

This is why I have been ignoring your argument and following a separate discussion. Your approximations cannot be made arbitrarily accurate, so they are not meaningful mathematically.

stevendaryl said:
We don't need for the approximation ##N\ln x -x \approx N\ln N -N - \frac{(x-N)^2}{2N}## to be a good approximation when ##x## is large.
Agreed...

But, we DO need ##\exp(N\ln x -x) \approx \exp ( N\ln N -N - \frac{(x-N)^2}{2N})## to be a good approximation when ##x## is large! Don’t we?

Anyway I don’t really care anymore... I’m feeling quite drained. Apologies if anything came off rudely.
 
  • #15
The integrated vanishes quickly everywhere except near x=N for large N. In fact, the larger N is the more rapidly it vanishes as x differs from N. So you’re actually integrating over a very narrow interval which is symmetric about N.
 
  • #16
Hiero said:
I’ve understood your point the first two times. It’s a nice wave of the hands, but it does not align with my understanding of the use of the ##\approx## symbol. My understanding is, whenever this symbol is used mathematically, (engineering would be different,) there is some condition such that the error of the approximation becomes arbitrarily small. In other words for every “##\approx##” there is an associated “as x → y”...

Your approximations cannot be made arbitrarily accurate, so they are not meaningful mathematically.

The issue is you say this, but your original post says:

Hiero said:
The famous Stirling’s approximation is ##N! \approx \sqrt{2\pi N}(N/e)^N## which becomes more accurate for larger N.

which is apparently wrong.

Stirling's approximation actually says ##N! \sim \sqrt{2\pi N}(N/e)^N##

It is quite common for people to confuse convergence of ##x_n \to y## with ##\frac{x_n}{y} \to 1## and vice versa. To avoid issues like this, I'd suggest throwing out any 'proof' of Stirling's approximation that does not include explicit error bounds on the approximation being made. This includes the link in your original post.
 
  • #18
Hiero said:
But, we DO need ##\exp(N\ln x -x) \approx \exp ( N\ln N -N - \frac{(x-N)^2}{2N})## to be a good approximation when ##x## is large! Don’t we?

No, the argument does not rely on that. We don't use that approximation in the region when ##x## gets arbitrarily large.
 
  • #19
Hiero said:
I already said in the OP that the approximation will be accurate when x-N is small (i.e. for ##x\approx N##)

To repeat @StoneTemplePython 's observation:
Does everyone agree that the notation ##a(n) \approx b(n)## means (in the context of this thread) ##lim_{n \rightarrow \infty} (a(n))/ b(n)) = 1 ## and not that ##lim_{n \rightarrow \infty} (b(n) - a(n)) = 0 ## ?

##x \approx N## is true when ##x = N + 100## (e.g. consider N = 10^{10}) But in that case ##x - N## isn't small.
 
  • #20
Hiero said:
Well then... with all due respect... you’re wrong!

If ##E(x) ≥ (x-N)^2/(2N)## for large x then ##E(x) - (x-N)^2/(2N) ≥ 0## for large x and ##\int_0^{\infty} e^{E(x)-(x-N)^2/(2N)}dx## would clearly diverge (because the integrand would not go to zero). But it doesn’t diverge, because the integral in post #10 is identically equal to N factorial. It’s a funny way of writing the gamma function integral.

Perhaps you are imagining E(x) will out grow (x-N)^2 because the leading term of E(x) is proportional to (x-N)^3 ?

But if we think about all the terms in E(x), there will be an alternating pattern of positive and negative coefficients. The net sum needs to give ##E(x) < (x-N)^2/(2N)## for large x (though I suspect it’s true for all x if we include equality at x=N).

You need to understand that ##N!## and ##\text{Stirling}(N) \equiv S(N)## differ by an amount that goes to ##\infty## as ##N \to \infty##. However, the relative error goes to zero. For example, ##1000! \approx .4023870543 \times 10^{2568}## while ##S(1000) \approx .4023537006 \times 10^{2568}.## These differ by ##\Delta(1000) \approx .333537 \times 10^{2564},## which is a huge number, so the absolute error is enormous. However, the relative error is small: ##\Delta(1000)/1000! \approx .8288959509 \times 10^{-4}##.

Anyway, if you change variables to ##x = (x-N)/\sqrt{N}## in the exact integral, you have
$$N! = \int_{-\sqrt{N}}^\infty F(z) \, dz$$ for some function ##F(z)## that is pretty sharply peaked about ##z = 0##. One can argue that, given any ##\epsilon > 0## the integrals ##\int_{-\sqrt{N}}^\infty F(z) \, dz## and ##\int_{-L}^L F(z) \, dz## have relative error ##< \epsilon##. That is,given ##\epsilon > 0## we can find a large fixed ##L## such that for all sufficiently large ##N## we have that
$$\frac{|\int_{-\sqrt{N}}^\infty F(z) \, dz - \int_{-L}^L F(z) \, dz|}{\int_{-\sqrt{N}}^\infty F(z) \, dz} < \epsilon$$ Furthermore, for sufficiently large (but still fixed) ##L## and large enough ##N## we can show that the integrals ##\int_{-L}^L F(z) \, dz## and ##\int_{-L}^L F_0(z) \, dz## have relative error ##< \epsilon##. Here, ##F_0## is the simplified function that you get by expanding the exponent in a series about ##z=0## and keeping only terms up to second order. Finally, we can choose ##L## so that the two integrals ##\int_{-\infty}^\infty F_0(z) \, dz = \text{Stirling} ## and ##\int_{-L}^L F_0(z) \, dz## have relative difference ##< \epsilon##. Altogether, we find that the exact factorial and the Stirling approximation have relative error less than about ##3 \epsilon##
 

FAQ: Can anyone justify this derivation of Stirling’s approximation?

1. What is Stirling's approximation?

Stirling's approximation is a mathematical formula used to approximate the value of a factorial. It states that for large values of n, n! can be approximated by √(2πn)(n/e)^n, where e is the base of the natural logarithm and π is the mathematical constant pi.

2. Why is Stirling's approximation useful?

Stirling's approximation is useful because it allows us to quickly estimate the value of large factorials without having to calculate them directly. This can save time and effort in mathematical calculations and also provide a good approximation for complex problems.

3. How is Stirling's approximation derived?

Stirling's approximation can be derived using the method of steepest descent in calculus. This involves finding the maximum of the logarithm of the factorial function and then approximating the integral around that maximum point. The result is the formula for Stirling's approximation.

4. Can anyone justify this derivation of Stirling's approximation?

Yes, the derivation of Stirling's approximation is a well-established and accepted mathematical proof. It has been rigorously tested and verified by numerous mathematicians and is widely used in various fields of science and engineering.

5. Are there any limitations to Stirling's approximation?

Yes, Stirling's approximation is only accurate for large values of n. As n gets smaller, the approximation becomes less accurate. Additionally, Stirling's approximation does not work well for very small or negative values of n, and may produce inaccurate results in these cases.

Back
Top