# Re: Probability for the first digit of a natural number being equal to 1

1. ### tom.stoer

5,489
Lets define D1(n) to be the first digit of a natural number n, e.g. D1(571) = 5, D1(119) = 1, ...

Naturally one would expect that all digits come with the same probability 1/9, but looking at it in more detail one finds rather counter-intuitive results.

If one counts these digits and calculates the probability using an interval [1,N] one finds that the limit for N to infinity does not exist! The probability can be defined using a function C'1'(n) which counts the numbers in [1,n] starting with digit '1'. This counting function applied to the numbers n = 1, 2, 3, ..., 9, 10, 11, ..., 19, 20, ..., 99, 100, 101, ... results in 1, 1, 1, ..., 1, 2, 3, ..., 11, 11, ..., 11, 12, 13, ... The probability finding '1' as the first digit can be defined as P'1'(n) = C'1'(n)/n. So one counts the numbers in the interval [1,n] starting with '1' and divides by n. One immediately sees that this probability has a minimum value of 1/9 whenever n=9, 99, 999, ... for which C has the values 1, 11, 111, ... and it has a maximum value (which approaches 5/9 from above) whenever n is of the form 19, 199, 1999, ...

That means that P'1'(n) = C'1'(n) / n does not converge to a fixed value p as n goes to infinity!

How can one resolve this issue?
Is there a unique way to define p differently - with the "correct" results 1/9?
Or is is this probability not defined at all?

Let me note that I think that Benford's law does not apply here. There is no reason for scale invariance; we are not talking about random numbers; we are not talking about measurement of sizes, areas or something like that, just the natural numbers.

2. ### bpet

523
Congratulations, you've discovered that the natural numbers cannot be assigned a uniform distribution!

One could say that the numbers _are_ random because the selection mechanism is not fully understood, and that the probability p is defined but not known (and can be made whatever you like, with different selection models).

Edit: on second thoughts, looks like the number theorists have a name for this - Natural density and you found a set that has different upper and lower densities.

Last edited: Oct 27, 2010
3. ### tom.stoer

5,489
So the conclusion is that the issue can't be resolved in a natural and unqiue way.

I am sure I am not the first one who struglled about that. Do you know a short paper discussig this topic?

4. ### chiro

Lets say you are restricting the numbers you have from say 1 to N. To find the uppermost digit all you have to do is calculate Floor(log_10(X_i))+1 to get the "size" (ie how many digits in base 10) of the number.

So lets say you want to find the probability of getting a number beginning with 1. So lets say your range is 1 - 999. The number of entries starting with a 1 is

1 + 10 + 100 = 111

So P(D(n)=1) = 111/999 = 1/9

Assuming that each trial is independent for a Discrete Random Variable [1,999] then
P(A and B) = P(A)P(B) which is just a straightforward calculation.

Note that if you're set is not finite, then you can not assign a proper distribution. Also note that if you use a range of which log_10(A+1) != An integer then you will have to modify your algorithm to calculate correct probabilities since the distribution will not be uniform as my above test case is.

5. ### bpet

523
An alternative approach: Natural density

### Staff: Mentor

Why not? That to me seems to be the natural resolution. Natural density doesn't help; it gives an upper bound of 1 and a lower bound of 1/9.

7. ### bpet

523
No the OP isn't talking about random numbers, and what they call "probability" is really "natural density". In essence, they found a neat example of a set without a natural density by showing that the limsup and liminf values disagree.

8. ### tom.stoer

5,489
One precondition for Benfords law is that the logarithms of the numbers are uniformly distributed. But this is not the case for the natural numbers; here the numbers themselves are uniformly distributed.

We can write each number as

n = 10kn = x 10k

with x < 10. One can restrict to the set n = {1, 2, ..., 9} as all other sets {10, ..., 99} etc. will be mapped to the first case in the following. One calculates the logarithm of each number and extracts its fractional part:

log10n = log10(10kx) = k + log10x

The fractional part is just log10x so all subsequent intervals will create a self-similar distribution of the fractional parts => the distribution d(x) of the fractional parts can be derived from

f(x) = log10x

Of course this is not a uniform distribution in [1, 10[ and therefore Benford's law does not apply.

I think this is rather natural as the natural numbers have nothing to do with growth processes - and this is were you find Benford's law quite frequently.

### Staff: Mentor

Not quite. The precondition is that logarithms are uniformly distributed over some large but finite range. There is no such thing as a uniform distribution over an infinite range. The concept doesn't make sense.

Benford's law throws out the characteristic part of the logarithm, leaving only the mantissa. While the exact range isn't know, it doesn't really matter because the extent of the range is being tossed out the window by ignoring the characteristic.

That doesn't make sense, and this is what is causing your problems. This is a "doctor it hurts when I do this" kind of problem. The solution is the same advice as that given by the doctor.

10. ### tom.stoer

5,489
nice - but I am a stubborn child I therefore I not stop :-)

Let's assume I restrict this to a certain subset of the natural numbers like {m, m+1, m+2, ..., m+L}; then the numbers are distributed uniformly. Now I map this set into [1, 10[ => the fractional part log10(n) is no longer uniformly distributed.

What is wrong with this idea?

### Staff: Mentor

That leads to the natural density interpretation -- and you still have a problem. The inferior and superior limits are not equal as you move your m and L about.

12. ### tom.stoer

5,489
Yes, I already saw that this is the case; the construction does not solve anything; it should only serve as a counter example to the uniformly distributed logarithm required for Benford's law.

You stated that "there is no such thing as a uniform distribution over an infinite range. The concept doesn't make sense." If this is all that can be said then we should close this thread - unfortunately.

13. ### CRGreathouse

3,682
It dos have logarithmic density, though -- right?

14. ### themaestro

14
is this problem not related to 'Benford’s Law' ?

15. ### tom.stoer

5,489
At first one could think so, but I think it is not. One reason is that Benford's law says something about the density of the logarithms which does not apply to the real numbers.

You can simply use the natural density interpretation mentioned above; it is really "natural" and you will see that there's no escape from the fact that this probability (a natural number to start with digit '1') is not defined.

Just do the following:
- define the sets ZN = {1,2,3,4, ...N}
- define the subsets X'1'N = {x in 1,2,3,4, ...N | x starts with '1' }
- define the size of these sets as |ZN| = N and |X'1'N| = C'1'(N)
- define the probability as P'1'(N) = C'1'(N) / N

You will see that this probability does not converge in the limit N to infinity; instead the lim sup will be 5/9 whereas the lim inf will be 1/9. That means that the sets X'1'N as defined above do not allow for a natural definition of a probability.

16. ### bpet

523
An interesting related discussion is here: https://www.physicsforums.com/showthread.php?t=442986 (from a measure-theoretic perspective). In particular they show that the natural density is not a probability measure because it isn't sigma-additive - for example the natural numbers (density 1) is a countable union of singletons (density 0).

### Staff: Mentor

To answer CR's question: Yes, it does, thanks for that! The logarithmic density is ln 2/ln 10 = log10 2 -- and that is exactly what Benford's law says.

18. ### tom.stoer

5,489
How do you show that? It explicitly contradicts what we have discussed.

### Staff: Mentor

The definition of logarithmic density of some subset $A\in\mathbb N$ is

$$\delta(A) \equiv \lim_{n\to\infty} \frac{\,\,\displaystyle {\sum_{\substack{r<=n,\\r\in A}}\frac 1 r}\,\,} {\displaystyle \sum_{r=1}^n\frac 1 r}$$

... if that limit exists. Alternatively, using lower and upper limits, the logarithmic density is the lower or upper limit if both of those limits exist and are equal. Those two limits are equal in this case.

Let f be some real number in [1, 10) (i.e., the mantissa of a real number in base 10). Denote

\begin{align*} C_{f\cdot 10^n} &= \sum_{\substack{r<=\lfloor f\cdot10^n \rfloor,\\r\in A}}\frac 1 r \\[4pt] H_{f\cdot 10^n} &=\sum_{r=1}^{\lfloor f\cdot 10^n \rfloor} \frac 1 r \end{align*}

Note that used H here because the denominator $H_{f\cdot 10^n}$ is the $\lfloor f\cdot 10^n \rfloor^{\text{th}}$ harmonic number. For large n,

\begin{align*} C_{f10^n} &\to (n+\mathcal O(1))\,\ln 2 \\[4pt] H_{f10^n} &\to (n+\mathcal O(1))\,\ln 10 \\[4pt] \end{align*}

The mantissa f is absorbed in that O(1) term. In the limit $\n\to\infty$, the ratio becomes $\ln 2/\ln 10$, or log10 2.

Last edited: Nov 4, 2010

### Staff: Mentor

I showed that just above. Just because the natural density of a set does not exist does not mean that the logarithmic density does not exist.