Re: Probability for the first digit of a natural number being equal to 1

tom.stoer · Oct 26, 2010

Lets define D₁(n) to be the first digit of a natural number n, e.g. D₁(571) = 5, D₁(119) = 1, ...

Naturally one would expect that all digits come with the same probability 1/9, but looking at it in more detail one finds rather counter-intuitive results.

If one counts these digits and calculates the probability using an interval [1,N] one finds that the limit for N to infinity does not exist! The probability can be defined using a function C_'1'(n) which counts the numbers in [1,n] starting with digit '1'. This counting function applied to the numbers n = 1, 2, 3, ..., 9, 10, 11, ..., 19, 20, ..., 99, 100, 101, ... results in 1, 1, 1, ..., 1, 2, 3, ..., 11, 11, ..., 11, 12, 13, ... The probability finding '1' as the first digit can be defined as P_'1'(n) = C_'1'(n)/n. So one counts the numbers in the interval [1,n] starting with '1' and divides by n. One immediately sees that this probability has a minimum value of 1/9 whenever n=9, 99, 999, ... for which C has the values 1, 11, 111, ... and it has a maximum value (which approaches 5/9 from above) whenever n is of the form 19, 199, 1999, ...

That means that P_'1'(n) = C_'1'(n) / n does not converge to a fixed value p as n goes to infinity!

How can one resolve this issue?
Is there a unique way to define p differently - with the "correct" results 1/9?
Or is is this probability not defined at all?

Let me note that I think that Benford's law does not apply here. There is no reason for scale invariance; we are not talking about random numbers; we are not talking about measurement of sizes, areas or something like that, just the natural numbers.

bpet · Oct 26, 2010

tom.stoer said:

...That means that P_'1'(n) = C_'1'(n) / n does not converge to a fixed value p as n goes to infinity!

How can one resolve this issue?
Is there a unique way to define p differently - with the "correct" results 1/9?
Or is is this probability not defined at all?

Let me note that I think that Benford's law does not apply here. There is no reason for scale invariance; we are not talking about random numbers; we are not talking about measurement of sizes, areas or something like that, just the natural numbers.

Congratulations, you've discovered that the natural numbers cannot be assigned a uniform distribution!

One could say that the numbers _are_ random because the selection mechanism is not fully understood, and that the probability p is defined but not known (and can be made whatever you like, with different selection models).

Edit: on second thoughts, looks like the number theorists have a name for this - http://en.wikipedia.org/wiki/Natural_density" and you found a set that has different upper and lower densities.

tom.stoer · Oct 27, 2010

So the conclusion is that the issue can't be resolved in a natural and unqiue way.

I am sure I am not the first one who struglled about that. Do you know a short paper discussig this topic?

chiro · Oct 27, 2010

Lets say you are restricting the numbers you have from say 1 to N. To find the uppermost digit all you have to do is calculate Floor(log_10(X_i))+1 to get the "size" (ie how many digits in base 10) of the number.

So let's say you want to find the probability of getting a number beginning with 1. So let's say your range is 1 - 999. The number of entries starting with a 1 is

1 + 10 + 100 = 111

So P(D(n)=1) = 111/999 = 1/9

Assuming that each trial is independent for a Discrete Random Variable [1,999] then
P(A and B) = P(A)P(B) which is just a straightforward calculation.

Note that if you're set is not finite, then you can not assign a proper distribution. Also note that if you use a range of which log_10(A+1) != An integer then you will have to modify your algorithm to calculate correct probabilities since the distribution will not be uniform as my above test case is.

bpet · Oct 27, 2010

tom.stoer said:

So the conclusion is that the issue can't be resolved in a natural and unqiue way.

I am sure I am not the first one who struglled about that. Do you know a short paper discussig this topic?

An alternative approach: http://en.wikipedia.org/wiki/Natural_density"

D H · Oct 27, 2010

tom.stoer said:

Let me note that I think that Benford's law does not apply here.

Why not? That to me seems to be the natural resolution. Natural density doesn't help; it gives an upper bound of 1 and a lower bound of 1/9.

bpet · Oct 27, 2010

D H said:

Why not? That to me seems to be the natural resolution. Natural density doesn't help; it gives an upper bound of 1 and a lower bound of 1/9.

No the OP isn't talking about random numbers, and what they call "probability" is really "natural density". In essence, they found a neat example of a set without a natural density by showing that the limsup and liminf values disagree.

tom.stoer · Oct 27, 2010

D H said:

Why not? That to me seems to be the natural resolution.

One precondition for Benfords law is that the logarithms of the numbers are uniformly distributed. But this is not the case for the natural numbers; here the numbers themselves are uniformly distributed.

We can write each number as

n = 10^kn = x 10^k

with x < 10. One can restrict to the set n = {1, 2, ..., 9} as all other sets {10, ..., 99} etc. will be mapped to the first case in the following. One calculates the logarithm of each number and extracts its fractional part:

log₁₀n = log₁₀(10^kx) = k + log¹⁰x

The fractional part is just log¹⁰x so all subsequent intervals will create a self-similar distribution of the fractional parts => the distribution d(x) of the fractional parts can be derived from

f(x) = log¹⁰x

Of course this is not a uniform distribution in [1, 10[ and therefore Benford's law does not apply.

I think this is rather natural as the natural numbers have nothing to do with growth processes - and this is were you find Benford's law quite frequently.

D H · Oct 27, 2010

tom.stoer said:

One precondition for Benfords law is that the logarithms of the numbers are uniformly distributed.

Not quite. The precondition is that logarithms are uniformly distributed over some large but finite range. There is no such thing as a uniform distribution over an infinite range. The concept doesn't make sense.

Benford's law throws out the characteristic part of the logarithm, leaving only the mantissa. While the exact range isn't know, it doesn't really matter because the extent of the range is being tossed out the window by ignoring the characteristic.

But this is not the case for the natural numbers; here the numbers themselves are uniformly distributed.

That doesn't make sense, and this is what is causing your problems. This is a "doctor it hurts when I do this" kind of problem. The solution is the same advice as that given by the doctor.

tom.stoer · Oct 27, 2010

D H said:

That doesn't make sense, and this is what is causing your problems. This is a "doctor it hurts when I do this" kind of problem. The solution is the same advice as that given by the doctor.

nice - but I am a stubborn child I therefore I not stop :-)

Let's assume I restrict this to a certain subset of the natural numbers like {m, m+1, m+2, ..., m+L}; then the numbers are distributed uniformly. Now I map this set into [1, 10[ => the fractional part log₁₀(n) is no longer uniformly distributed.

What is wrong with this idea?

D H · Oct 27, 2010

That leads to the natural density interpretation -- and you still have a problem. The inferior and superior limits are not equal as you move your m and L about.

tom.stoer · Oct 27, 2010

D H said:

That leads to the natural density interpretation -- and you still have a problem. The inferior and superior limits are not equal as you move your m and L about.

Yes, I already saw that this is the case; the construction does not solve anything; it should only serve as a counter example to the uniformly distributed logarithm required for Benford's law.

You stated that "there is no such thing as a uniform distribution over an infinite range. The concept doesn't make sense." If this is all that can be said then we should close this thread - unfortunately.

CRGreathouse · Oct 27, 2010

D H said:

Why not? That to me seems to be the natural resolution. Natural density doesn't help; it gives an upper bound of 1 and a lower bound of 1/9.

It dos have logarithmic density, though -- right?

themaestro · Nov 3, 2010

is this problem not related to 'Benford’s Law' ?

tom.stoer · Nov 3, 2010

themaestro said:

is this problem not related to 'Benford’s Law' ?

At first one could think so, but I think it is not. One reason is that Benford's law says something about the density of the logarithms which does not apply to the real numbers.

You can simply use the natural density interpretation mentioned above; it is really "natural" and you will see that there's no escape from the fact that this probability (a natural number to start with digit '1') is not defined.

Just do the following:
- define the sets Z_N = {1,2,3,4, ...N}
- define the subsets X^'1'_N = {x in 1,2,3,4, ...N | x starts with '1' }
- define the size of these sets as |Z_N| = N and |X^'1'_N| = C_'1'(N)
- define the probability as P_'1'(N) = C_'1'(N) / N

You will see that this probability does not converge in the limit N to infinity; instead the lim sup will be 5/9 whereas the lim inf will be 1/9. That means that the sets X^'1'_N as defined above do not allow for a natural definition of a probability.

bpet · Nov 3, 2010

An interesting related discussion is here: https://www.physicsforums.com/showthread.php?t=442986 (from a measure-theoretic perspective). In particular they show that the natural density is not a probability measure because it isn't sigma-additive - for example the natural numbers (density 1) is a countable union of singletons (density 0).

D H · Nov 3, 2010

CRGreathouse said:

It does have logarithmic density, though -- right?

themaestro said:

is this problem not related to 'Benford’s Law' ?

To answer CR's question: Yes, it does, thanks for that! The logarithmic density is ln 2/ln 10 = log₁₀ 2 -- and that is exactly what Benford's law says.

tom.stoer · Nov 3, 2010

D H said:

To answer CR's question: Yes, it does, thanks for that! The logarithmic density is ln 2/ln 10 = log₁₀ 2 -- and that is exactly what Benford's law says.

How do you show that? It explicitly contradicts what we have discussed.

D H · Nov 3, 2010

The definition of logarithmic density of some subset [itex]A\in\mathbb N[/itex] is

[tex]
\delta(A) \equiv
\lim_{n\to\infty} \frac{\,\,\displaystyle {\sum_{\substack{r<=n,\\r\in A}}\frac 1 r}\,\,}
{\displaystyle \sum_{r=1}^n\frac 1 r}
[/tex]

... if that limit exists. Alternatively, using lower and upper limits, the logarithmic density is the lower or upper limit if both of those limits exist and are equal. Those two limits are equal in this case.

Let f be some real number in [1, 10) (i.e., the mantissa of a real number in base 10). Denote

[tex]\begin{align*}
C_{f\cdot 10^n} &= \sum_{\substack{r<=\lfloor f\cdot10^n \rfloor,\\r\in A}}\frac 1 r \\[4pt]
H_{f\cdot 10^n} &=\sum_{r=1}^{\lfloor f\cdot 10^n \rfloor} \frac 1 r
\end{align*}[/tex]

Note that used H here because the denominator [itex]H_{f\cdot 10^n}[/itex] is the [itex]\lfloor f\cdot 10^n \rfloor^{\text{th}}[/itex] harmonic number. For large n,

[tex]\begin{align*}
C_{f10^n} &\to (n+\mathcal O(1))\,\ln 2 \\[4pt]
H_{f10^n} &\to (n+\mathcal O(1))\,\ln 10 \\[4pt]
\end{align*}[/tex]

The mantissa f is absorbed in that O(1) term. In the limit [itex]\n\to\infty[/itex], the ratio becomes [itex]\ln 2/\ln 10[/itex], or log₁₀ 2.

D H · Nov 4, 2010

tom.stoer said:

How do you show that? It explicitly contradicts what we have discussed.

I showed that just above. Just because the natural density of a set does not exist does not mean that the logarithmic density does not exist.

tom.stoer · Nov 4, 2010

I see; thanks for the explanation.

Last question: how do you "guess" which density is reasonable?

Re: Probability for the first digit of a natural number being equal to 1

1. What is the significance of the first digit in a natural number being equal to 1?

2. What is the probability of the first digit of a natural number being equal to 1?

3. Why does Benford's Law apply to the first digit of natural numbers?

4. How is Benford's Law used in real-world applications?

5. Are there any exceptions to Benford's Law?

Similar threads

Hot Threads

Recent Insights