Can anyone justify this derivation of Stirling’s approximation?

Hiero · Jan 6, 2019

The famous Stirling’s approximation is ##N! \approx \sqrt{2\pi N}(N/e)^N## which becomes more accurate for larger N. (Although it’s surprisingly accurate for small values!)

I have found a nice derivation of the formula, but there is one detail which bothers me. The derivation can be found here:
https://www.physics.harvard.edu/uploads/files/undergrad/probweek/sol56.pdf

In equation 3, we make the following approximation:

##N\ln x-x \approx N\ln N -N-\frac{(x-N)^2}{2N}##

Which is the truncated Taylor expansion of ##N\ln x-x## about it’s maximum at x = N.

I can’t seem to understand why this approximation should work. Truncated Taylor polynomials are typically used when the variation (x-N) is small. That is the condition for better accuracy. However in this situation, x is going to vary from zero (which is far from N for large N) all the way up to infinity! That is the bounds of the factorial integral.

So why should this approximation be accurate? How can we be confident beforehand that it should give accurate results?

This is an otherwise wonderful derivation! I really love it! But that one detail does not sit right with me. I would have never found this derivation, because even if it crossed my mind, I would have dismissed the accuracy of that Taylor approximation over the range that we use it.

Any insight (as to why the Taylor quadratic is accurate despite integrating it infinitely far from the point of expansion) will be much appreciated!

fresh_42 · Jan 6, 2019

##-N-\dfrac{(x-N)^2}{2N}=\dfrac{-2N^2-x^2-N^2+2xN}{2N}\approx\dfrac{-2N^2-N^2-N^2+2N^2}{2N}= -N## for ##x\approx N##

Hiero · Jan 6, 2019

fresh_42 said:

##-N-\dfrac{(x-N)^2}{2N}=\frac{-2N^2-x^2-N^2+2xN}{2N}\approx\frac{-2N^2-N^2-N^2+2N^2}{2N}= -N## for ##x\approx N##

Why not just say
##\frac{(x-N)^2}{2N}\approx \frac{(N-N)^2}{2N}=0##
for ##x\approx N##

Anyway, I don’t see your point. I already said in the OP that the approximation will be accurate when x-N is small (i.e. for ##x\approx N##)

The problem is that we are using this approximation inside the integral ##N! = \int^{∞}_0e^{N\ln x -x}dx## i.e. we’re using that approximation for infinitely large x.

Hiero · Jan 6, 2019

Here’s my thinking:

For large x, ##N\ln x - x## looks like -x+C (which can be seen from the fact that the derivative tends to -1).

Also, our Taylor approximation is an upside down parabola (coinciding at the maximum).

These two facts imply that the quadratic approximation will diverge below the true function. In other words, if we write,

##N \ln x - x = N\ln N - N - \frac{(x-N)^2}{2N} + E(x)##

Where E(x) is some error function, then we should expect lim,x→∞[E(x)] = ∞

If this limit was instead -∞ I would be somewhat satisfied, because then we would have:

##\exp (N \ln x - x)= \exp (N\ln N - N - \frac{(x-N)^2}{2N} + E(x)) = \exp (N\ln N - N - \frac{(x-N)^2}{2N})⋅\exp (E(x))##

Where this last factor ##\exp (E(x))## would tend towards 1 if the error function went to -∞

But it seems to me the error goes to +∞

So now I’m extra bothered

Good night.

stevendaryl · Jan 6, 2019

Hiero said:

Why not just say
##\frac{(x-N)^2}{2N}\approx \frac{(N-N)^2}{2N}=0##
for ##x\approx N##

Anyway, I don’t see your point. I already said in the OP that the approximation will be accurate when x-N is small (i.e. for ##x\approx N##)

The problem is that we are using this approximation inside the integral ##N! = \int^{∞}_0e^{N\ln x -x}dx## i.e. we’re using that approximation for infinitely large x.

If you look at the full integrand, it's ##N! = \int e^{x \ln(N) - x} dx##. Because ##e^{-x}## goes to zero so rapidly when ##x## is large, we don't need to care about what ##N \ln x## when ##x## is very large, since it makes a tiny contribution to the integral.

You could think of it this way: Letting ##f(x) = x\ln(N) - x##, you're trying to approximate ##\int_0^\infty e^{f(x)} dx##. So you split up the integral this way:

##I_1 = \int_0^{N-L} e^{f(x)} dx##
##I_2 = \int_{N-L}^{N+L} e^{f(x)} dx##
##I_3 = \int_{N+L}^\infty e^{f(x)} dx##

where ##L## is chosen so that the approximation ##f(x) = A + B(x-N) + C(x-N)^2## (the first few terms of the Taylor expansion about ##x=N##) is approximately valid in the region ##N-L < x < N+L## . So for ##I_2##, replacing ##f(x)## by the approximation is accurate.

For the integrals ##I_1## and ##I_3##, the approximation is NOT good, but we argue (or assume) that those two integrals are small compared with ##I_2##.

So we have the approximation ##N! \approx \int_{N-L}^{N+L} e^{f(x)} dx \approx \int_{N-L}^{N+L} e^{f_0(x)} dx##. where ##f_0(x)## is the approximation given by the Taylor expansion. Then we argue that since ##e^{f_0(x)}## is very small when ##|x-N| \gg L##, there is negligible harm in replacing the range of the integral by ##-\infty, +\infty##.

So it's not that the Taylor series is valid in the whole range ##-\infty, +\infty##, it's just that there is a negligible contribution to the integral in the region where it's not valid.

atyy · Jan 6, 2019

There are some pictures here which may be helpful:
http://www.physik.uni-regensburg.de...f mf_statistical_physics/Stirling_formula.pdf

stevendaryl · Jan 6, 2019

Using the graph from @atyy's link, the quadratic approximation for ##\phi(x) = N\ln x - x## when ##N=100## is very accurate in the region ##75 < x < 125##. It's pretty bad outside that region. But ##e^\phi## is effectively zero outside that range, anyway, so it doesn't matter how bad the approximation is.

Hiero · Jan 6, 2019

stevendaryl said:

If you look at the full integrand, it's ##N! = \int e^{N\ln x - x} dx ##Because ##e^{-x}## goes to zero so rapidly when ##x## is large, we don't need to care about what ##N \ln x## when ##x## is very large, since it makes a tiny contribution to the integral.

I understand everything you’re trying to say. I figured the key is that we’re exponentiating. I agree that the decay of e^-x outpaces the growth of x^N for large x. (Hence the infinite side of the unapproximated integral converges.) But I feel like that isn’t what we need to show...

Can you address what I’ve said in post #4?

Hiero said:

In other words, if we write,

##N \ln x - x = N\ln N - N - \frac{(x-N)^2}{2N} + E(x)##

Where E(x) is some error function, then we should expect lim,x→∞[E(x)] = ∞

If this limit was instead -∞ I would be somewhat satisfied, because then we would have:

##\exp (N \ln x - x)= \exp (N\ln N - N - \frac{(x-N)^2}{2N} + E(x)) = \exp (N\ln N - N - \frac{(x-N)^2}{2N})⋅\exp (E(x))##

Where this last factor ##\exp (E(x))## would tend towards 1 if the error function went to -∞

But it seems to me the error goes to +∞

The graph linked by @atyy shows what I was saying, that we must add a positively infinite error to the approximation to get the true value.

I feel we need to show something like ##E(x)<< \frac{(x-N)^2}{2N}## (for larger x) so that when we exponentiate the difference of those terms the latter dominates. It’s not obvious to me though.

I know that ultimately you’re right, that even with this error factor going to positive infinity the tail end of the integral still contributes negligibly. I know that’s true because it gives Stirling’s approximation! But if we couldn’t use the ends to justify the means, then I would not be convinced.

Anyway I do like your argument about splitting up the integral! I guess I should just accept that reasoning and call it a day /:

One thing to note:

stevendaryl said:

So we have the approximation ##N! \approx \int_{N-L}^{N+L} e^{f(x)} dx \approx \int_{N-L}^{N+L} e^{f_0(x)} dx##

We also need to say that ##\int_{N-L}^{N+L} e^{f_0(x)} dx \approx \int_{-∞}^{∞} e^{f_0(x)} dx##
But anyway it’s a Gaussian (bell curve) integral with mean at x=N so that last step makes sense.

...

Anyway thanks for your time! I hope you have something to say about my post #4 but if not then I’ll just accept your hand-waving

stevendaryl · Jan 6, 2019

Hiero said:

I feel we need to show something like ##E(x)<< \frac{(x-N)^2}{2N}## (for larger x) so that when we exponentiate the difference of those terms the latter dominates.

No, you don't need to show that, and it's not true. That's what I tried to say. You have three regions of interest:

##x \ll N##
##x \approx N##
##x \gg N##

It's only in region 2 that ##E(x) \ll \frac{(x-N)^2}{2N}##. It's not true for large x. But it's only in region 2 where the integral is non-negligible.

Hiero · Jan 6, 2019

stevendaryl said:

No, you don't need to show that, and it's not true.

At the very least we can say ##E(x) < (x-N)^2/(2N)## for large x (the double << is vague in my mind anyway). Surely you don’t disagree with that?

Because we get Stirling’s approximation from the following integral (by leaving out E(x) and replacing 0 with -∞)
$$N! = \Bigg(\frac{N}{e}\Bigg)^N\int_0^{\infty} \exp\Bigg(E(x)-\frac{(x-N)^2}{2N}\Bigg)dx$$
This integral (with E(x) in there and 0 as the lowerbound) is precise not approximate.

If E(x) > (x-N)^2/(2N) for large x, then clearly that integral would diverge.

stevendaryl · Jan 6, 2019

Hiero said:

At the very least we can say ##E(x) < (x-N)^2/(2N)## for large x (the double << is vague in my mind anyway). Surely you don’t disagree with that?

That's exactly what I was disagreeing with. It's not true.

Hiero · Jan 6, 2019

stevendaryl said:

That's exactly what I was disagreeing with. It's not true.

Well then... with all due respect... you’re wrong!

If ##E(x) ≥ (x-N)^2/(2N)## for large x then ##E(x) - (x-N)^2/(2N) ≥ 0## for large x and ##\int_0^{\infty} e^{E(x)-(x-N)^2/(2N)}dx## would clearly diverge (because the integrand would not go to zero). But it doesn’t diverge, because the integral in post #10 is identically equal to N factorial. It’s a funny way of writing the gamma function integral.

Perhaps you are imagining E(x) will out grow (x-N)^2 because the leading term of E(x) is proportional to (x-N)^3 ?

But if we think about all the terms in E(x), there will be an alternating pattern of positive and negative coefficients. The net sum needs to give ##E(x) < (x-N)^2/(2N)## for large x (though I suspect it’s true for all x if we include equality at x=N).

stevendaryl · Jan 6, 2019

Hiero said:

Well then... with all due respect... you’re wrong!

Let me go through this one more time, and see if you can agree with all the steps.

##N!## is ##\int_0^ \infty x^N e^{-x} dx##
Define ##I(L)## be ##\int_{N-L}^{N+L} x^N e^{-x} dx##.
Define ##I_0(L)## be ##N^N e^{-N} \int_{N-L}^{N+L} e^{- \frac{(x-N)^2}{2N}} dx##

The claim being made is that for some value of ##L## that is large, but still small compared with ##N##,

##N! \approx I(L) \approx I_0(L) \approx I_0(\infty)##

We don't need for the approximation ##N\ln x -x \approx N\ln N -N - \frac{(x-N)^2}{2N}## to be a good approximation when ##x## is large. You only need for it to be a good approximation in the range ##N-L < x < N+L##.

Hiero · Jan 6, 2019

stevendaryl said:

Let me go through this one more time,

I’ve understood your point the first two times. It’s a nice wave of the hands, but it does not align with my understanding of the use of the ##\approx## symbol. My understanding is, whenever this symbol is used mathematically, (engineering would be different,) there is some condition such that the error of the approximation becomes arbitrarily small. In other words for every “##\approx##” there is an associated “as x → y”

When you say “there is some value of L such that ##N! \approx I(L)##,” then I consider this nonsense. Is 1.00001 ≈ 1? Maybe, or maybe not. Is 1,000,000 ≈ 1? In some contexts, yes.

In my eyes it’s only meaningful to say something like “ ##e^x\approx 1## for small x ” because this translates into “ ##\lim _{x→0}[e^x]=1## ”

This is why I have been ignoring your argument and following a separate discussion. Your approximations cannot be made arbitrarily accurate, so they are not meaningful mathematically.

stevendaryl said:

We don't need for the approximation ##N\ln x -x \approx N\ln N -N - \frac{(x-N)^2}{2N}## to be a good approximation when ##x## is large.

Agreed...

But, we DO need ##\exp(N\ln x -x) \approx \exp ( N\ln N -N - \frac{(x-N)^2}{2N})## to be a good approximation when ##x## is large! Don’t we?

Anyway I don’t really care anymore... I’m feeling quite drained. Apologies if anything came off rudely.

alan2 · Jan 7, 2019

The integrated vanishes quickly everywhere except near x=N for large N. In fact, the larger N is the more rapidly it vanishes as x differs from N. So you’re actually integrating over a very narrow interval which is symmetric about N.

StoneTemplePython · Jan 7, 2019

Hiero said:

I’ve understood your point the first two times. It’s a nice wave of the hands, but it does not align with my understanding of the use of the ##\approx## symbol. My understanding is, whenever this symbol is used mathematically, (engineering would be different,) there is some condition such that the error of the approximation becomes arbitrarily small. In other words for every “##\approx##” there is an associated “as x → y”...

Your approximations cannot be made arbitrarily accurate, so they are not meaningful mathematically.

The issue is you say this, but your original post says:

Hiero said:

The famous Stirling’s approximation is ##N! \approx \sqrt{2\pi N}(N/e)^N## which becomes more accurate for larger N.

which is apparently wrong.

Stirling's approximation actually says ##N! \sim \sqrt{2\pi N}(N/e)^N##

It is quite common for people to confuse convergence of ##x_n \to y## with ##\frac{x_n}{y} \to 1## and vice versa. To avoid issues like this, I'd suggest throwing out any 'proof' of Stirling's approximation that does not include explicit error bounds on the approximation being made. This includes the link in your original post.

atyy · Jan 7, 2019

Maybe this is helpful: http://www.math.ubc.ca/~feldman/m321/asymptotic.pdf (error bounds are discussed in the appendix)

stevendaryl · Jan 8, 2019

Hiero said:

But, we DO need ##\exp(N\ln x -x) \approx \exp ( N\ln N -N - \frac{(x-N)^2}{2N})## to be a good approximation when ##x## is large! Don’t we?

No, the argument does not rely on that. We don't use that approximation in the region when ##x## gets arbitrarily large.

Stephen Tashi · Jan 8, 2019

Hiero said:

I already said in the OP that the approximation will be accurate when x-N is small (i.e. for ##x\approx N##)

To repeat @StoneTemplePython 's observation:
Does everyone agree that the notation ##a(n) \approx b(n)## means (in the context of this thread) ##lim_{n \rightarrow \infty} (a(n))/ b(n)) = 1 ## and not that ##lim_{n \rightarrow \infty} (b(n) - a(n)) = 0 ## ?

##x \approx N## is true when ##x = N + 100## (e.g. consider N = 10^{10}) But in that case ##x - N## isn't small.

Ray Vickson · Jan 26, 2019

Hiero said:

Well then... with all due respect... you’re wrong!

If ##E(x) ≥ (x-N)^2/(2N)## for large x then ##E(x) - (x-N)^2/(2N) ≥ 0## for large x and ##\int_0^{\infty} e^{E(x)-(x-N)^2/(2N)}dx## would clearly diverge (because the integrand would not go to zero). But it doesn’t diverge, because the integral in post #10 is identically equal to N factorial. It’s a funny way of writing the gamma function integral.

Perhaps you are imagining E(x) will out grow (x-N)^2 because the leading term of E(x) is proportional to (x-N)^3 ?

But if we think about all the terms in E(x), there will be an alternating pattern of positive and negative coefficients. The net sum needs to give ##E(x) < (x-N)^2/(2N)## for large x (though I suspect it’s true for all x if we include equality at x=N).

You need to understand that ##N!## and ##\text{Stirling}(N) \equiv S(N)## differ by an amount that goes to ##\infty## as ##N \to \infty##. However, the relative error goes to zero. For example, ##1000! \approx .4023870543 \times 10^{2568}## while ##S(1000) \approx .4023537006 \times 10^{2568}.## These differ by ##\Delta(1000) \approx .333537 \times 10^{2564},## which is a huge number, so the absolute error is enormous. However, the relative error is small: ##\Delta(1000)/1000! \approx .8288959509 \times 10^{-4}##.

Anyway, if you change variables to ##x = (x-N)/\sqrt{N}## in the exact integral, you have
$$N! = \int_{-\sqrt{N}}^\infty F(z) \, dz$$ for some function ##F(z)## that is pretty sharply peaked about ##z = 0##. One can argue that, given any ##\epsilon > 0## the integrals ##\int_{-\sqrt{N}}^\infty F(z) \, dz## and ##\int_{-L}^L F(z) \, dz## have relative error ##< \epsilon##. That is,given ##\epsilon > 0## we can find a large fixed ##L## such that for all sufficiently large ##N## we have that
$$\frac{|\int_{-\sqrt{N}}^\infty F(z) \, dz - \int_{-L}^L F(z) \, dz|}{\int_{-\sqrt{N}}^\infty F(z) \, dz} < \epsilon$$ Furthermore, for sufficiently large (but still fixed) ##L## and large enough ##N## we can show that the integrals ##\int_{-L}^L F(z) \, dz## and ##\int_{-L}^L F_0(z) \, dz## have relative error ##< \epsilon##. Here, ##F_0## is the simplified function that you get by expanding the exponent in a series about ##z=0## and keeping only terms up to second order. Finally, we can choose ##L## so that the two integrals ##\int_{-\infty}^\infty F_0(z) \, dz = \text{Stirling} ## and ##\int_{-L}^L F_0(z) \, dz## have relative difference ##< \epsilon##. Altogether, we find that the exact factorial and the Stirling approximation have relative error less than about ##3 \epsilon##

Can anyone justify this derivation of Stirling’s approximation?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Why ##a^0=1##?

Undergrad Finding the minimum distance between two curves

High School Straightforward integration…

High School Arc Length for Hyperbolic Sin

Undergrad Ambiguity of the term "indefinite integral"

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect