Re: Probability for the first digit of a natural number being equal to 1

In summary, the conversation discusses the probability of a natural number having a certain first digit and how it does not converge to a fixed value as the numbers become larger. The concept of natural density is mentioned but ultimately cannot fully explain the issue. The possibility of using Benford's law to resolve the problem is discussed, but ultimately deemed not applicable due to the nature of the natural numbers.
  • #1
tom.stoer
Science Advisor
5,779
172
Lets define D1(n) to be the first digit of a natural number n, e.g. D1(571) = 5, D1(119) = 1, ...

Naturally one would expect that all digits come with the same probability 1/9, but looking at it in more detail one finds rather counter-intuitive results.

If one counts these digits and calculates the probability using an interval [1,N] one finds that the limit for N to infinity does not exist! The probability can be defined using a function C'1'(n) which counts the numbers in [1,n] starting with digit '1'. This counting function applied to the numbers n = 1, 2, 3, ..., 9, 10, 11, ..., 19, 20, ..., 99, 100, 101, ... results in 1, 1, 1, ..., 1, 2, 3, ..., 11, 11, ..., 11, 12, 13, ... The probability finding '1' as the first digit can be defined as P'1'(n) = C'1'(n)/n. So one counts the numbers in the interval [1,n] starting with '1' and divides by n. One immediately sees that this probability has a minimum value of 1/9 whenever n=9, 99, 999, ... for which C has the values 1, 11, 111, ... and it has a maximum value (which approaches 5/9 from above) whenever n is of the form 19, 199, 1999, ...

That means that P'1'(n) = C'1'(n) / n does not converge to a fixed value p as n goes to infinity!

How can one resolve this issue?
Is there a unique way to define p differently - with the "correct" results 1/9?
Or is is this probability not defined at all?

Let me note that I think that Benford's law does not apply here. There is no reason for scale invariance; we are not talking about random numbers; we are not talking about measurement of sizes, areas or something like that, just the natural numbers.
 
Physics news on Phys.org
  • #2
tom.stoer said:
...That means that P'1'(n) = C'1'(n) / n does not converge to a fixed value p as n goes to infinity!

How can one resolve this issue?
Is there a unique way to define p differently - with the "correct" results 1/9?
Or is is this probability not defined at all?

Let me note that I think that Benford's law does not apply here. There is no reason for scale invariance; we are not talking about random numbers; we are not talking about measurement of sizes, areas or something like that, just the natural numbers.

Congratulations, you've discovered that the natural numbers cannot be assigned a uniform distribution!

One could say that the numbers _are_ random because the selection mechanism is not fully understood, and that the probability p is defined but not known (and can be made whatever you like, with different selection models).

Edit: on second thoughts, looks like the number theorists have a name for this - http://en.wikipedia.org/wiki/Natural_density" and you found a set that has different upper and lower densities.
 
Last edited by a moderator:
  • #3
So the conclusion is that the issue can't be resolved in a natural and unqiue way.

I am sure I am not the first one who struglled about that. Do you know a short paper discussig this topic?
 
  • #4
Lets say you are restricting the numbers you have from say 1 to N. To find the uppermost digit all you have to do is calculate Floor(log_10(X_i))+1 to get the "size" (ie how many digits in base 10) of the number.

So let's say you want to find the probability of getting a number beginning with 1. So let's say your range is 1 - 999. The number of entries starting with a 1 is

1 + 10 + 100 = 111

So P(D(n)=1) = 111/999 = 1/9

Assuming that each trial is independent for a Discrete Random Variable [1,999] then
P(A and B) = P(A)P(B) which is just a straightforward calculation.

Note that if you're set is not finite, then you can not assign a proper distribution. Also note that if you use a range of which log_10(A+1) != An integer then you will have to modify your algorithm to calculate correct probabilities since the distribution will not be uniform as my above test case is.
 
  • #5
tom.stoer said:
So the conclusion is that the issue can't be resolved in a natural and unqiue way.

I am sure I am not the first one who struglled about that. Do you know a short paper discussig this topic?

An alternative approach: http://en.wikipedia.org/wiki/Natural_density"
 
Last edited by a moderator:
  • #6
tom.stoer said:
Let me note that I think that Benford's law does not apply here.
Why not? That to me seems to be the natural resolution. Natural density doesn't help; it gives an upper bound of 1 and a lower bound of 1/9.
 
  • #7
D H said:
Why not? That to me seems to be the natural resolution. Natural density doesn't help; it gives an upper bound of 1 and a lower bound of 1/9.

No the OP isn't talking about random numbers, and what they call "probability" is really "natural density". In essence, they found a neat example of a set without a natural density by showing that the limsup and liminf values disagree.
 
  • #8
D H said:
Why not? That to me seems to be the natural resolution.

One precondition for Benfords law is that the logarithms of the numbers are uniformly distributed. But this is not the case for the natural numbers; here the numbers themselves are uniformly distributed.

We can write each number as

n = 10kn = x 10k

with x < 10. One can restrict to the set n = {1, 2, ..., 9} as all other sets {10, ..., 99} etc. will be mapped to the first case in the following. One calculates the logarithm of each number and extracts its fractional part:

log10n = log10(10kx) = k + log10x

The fractional part is just log10x so all subsequent intervals will create a self-similar distribution of the fractional parts => the distribution d(x) of the fractional parts can be derived from

f(x) = log10x

Of course this is not a uniform distribution in [1, 10[ and therefore Benford's law does not apply.

I think this is rather natural as the natural numbers have nothing to do with growth processes - and this is were you find Benford's law quite frequently.
 
  • #9
tom.stoer said:
One precondition for Benfords law is that the logarithms of the numbers are uniformly distributed.
Not quite. The precondition is that logarithms are uniformly distributed over some large but finite range. There is no such thing as a uniform distribution over an infinite range. The concept doesn't make sense.

Benford's law throws out the characteristic part of the logarithm, leaving only the mantissa. While the exact range isn't know, it doesn't really matter because the extent of the range is being tossed out the window by ignoring the characteristic.

But this is not the case for the natural numbers; here the numbers themselves are uniformly distributed.
That doesn't make sense, and this is what is causing your problems. This is a "doctor it hurts when I do this" kind of problem. The solution is the same advice as that given by the doctor.
 
  • #10
D H said:
That doesn't make sense, and this is what is causing your problems. This is a "doctor it hurts when I do this" kind of problem. The solution is the same advice as that given by the doctor.
nice - but I am a stubborn child I therefore I not stop :-)

Let's assume I restrict this to a certain subset of the natural numbers like {m, m+1, m+2, ..., m+L}; then the numbers are distributed uniformly. Now I map this set into [1, 10[ => the fractional part log10(n) is no longer uniformly distributed.

What is wrong with this idea?
 
  • #11
That leads to the natural density interpretation -- and you still have a problem. The inferior and superior limits are not equal as you move your m and L about.
 
  • #12
D H said:
That leads to the natural density interpretation -- and you still have a problem. The inferior and superior limits are not equal as you move your m and L about.
Yes, I already saw that this is the case; the construction does not solve anything; it should only serve as a counter example to the uniformly distributed logarithm required for Benford's law.

You stated that "there is no such thing as a uniform distribution over an infinite range. The concept doesn't make sense." If this is all that can be said then we should close this thread - unfortunately.
 
  • #13
D H said:
Why not? That to me seems to be the natural resolution. Natural density doesn't help; it gives an upper bound of 1 and a lower bound of 1/9.

It dos have logarithmic density, though -- right?
 
  • #14
is this problem not related to 'Benford’s Law' ?
 
  • #15
themaestro said:
is this problem not related to 'Benford’s Law' ?
At first one could think so, but I think it is not. One reason is that Benford's law says something about the density of the logarithms which does not apply to the real numbers.

You can simply use the natural density interpretation mentioned above; it is really "natural" and you will see that there's no escape from the fact that this probability (a natural number to start with digit '1') is not defined.

Just do the following:
- define the sets ZN = {1,2,3,4, ...N}
- define the subsets X'1'N = {x in 1,2,3,4, ...N | x starts with '1' }
- define the size of these sets as |ZN| = N and |X'1'N| = C'1'(N)
- define the probability as P'1'(N) = C'1'(N) / N

You will see that this probability does not converge in the limit N to infinity; instead the lim sup will be 5/9 whereas the lim inf will be 1/9. That means that the sets X'1'N as defined above do not allow for a natural definition of a probability.
 
  • #16
An interesting related discussion is here: https://www.physicsforums.com/showthread.php?t=442986 (from a measure-theoretic perspective). In particular they show that the natural density is not a probability measure because it isn't sigma-additive - for example the natural numbers (density 1) is a countable union of singletons (density 0).
 
  • #17
CRGreathouse said:
It does have logarithmic density, though -- right?
themaestro said:
is this problem not related to 'Benford’s Law' ?
To answer CR's question: Yes, it does, thanks for that! The logarithmic density is ln 2/ln 10 = log10 2 -- and that is exactly what Benford's law says.
 
  • #18
D H said:
To answer CR's question: Yes, it does, thanks for that! The logarithmic density is ln 2/ln 10 = log10 2 -- and that is exactly what Benford's law says.
How do you show that? It explicitly contradicts what we have discussed.
 
  • #19
The definition of logarithmic density of some subset [itex]A\in\mathbb N[/itex] is

[tex]
\delta(A) \equiv
\lim_{n\to\infty} \frac{\,\,\displaystyle {\sum_{\substack{r<=n,\\r\in A}}\frac 1 r}\,\,}
{\displaystyle \sum_{r=1}^n\frac 1 r}
[/tex]

... if that limit exists. Alternatively, using lower and upper limits, the logarithmic density is the lower or upper limit if both of those limits exist and are equal. Those two limits are equal in this case.

Let f be some real number in [1, 10) (i.e., the mantissa of a real number in base 10). Denote

[tex]\begin{align*}
C_{f\cdot 10^n} &= \sum_{\substack{r<=\lfloor f\cdot10^n \rfloor,\\r\in A}}\frac 1 r \\[4pt]
H_{f\cdot 10^n} &=\sum_{r=1}^{\lfloor f\cdot 10^n \rfloor} \frac 1 r
\end{align*}[/tex]

Note that used H here because the denominator [itex]H_{f\cdot 10^n}[/itex] is the [itex]\lfloor f\cdot 10^n \rfloor^{\text{th}}[/itex] harmonic number. For large n,

[tex]\begin{align*}
C_{f10^n} &\to (n+\mathcal O(1))\,\ln 2 \\[4pt]
H_{f10^n} &\to (n+\mathcal O(1))\,\ln 10 \\[4pt]
\end{align*}[/tex]

The mantissa f is absorbed in that O(1) term. In the limit [itex]\n\to\infty[/itex], the ratio becomes [itex]\ln 2/\ln 10[/itex], or log10 2.
 
Last edited:
  • #20
tom.stoer said:
How do you show that? It explicitly contradicts what we have discussed.
I showed that just above. Just because the natural density of a set does not exist does not mean that the logarithmic density does not exist.
 
  • #21
I see; thanks for the explanation.

Last question: how do you "guess" which density is reasonable?
 

1. What is the significance of the first digit in a natural number being equal to 1?

The first digit of a natural number being equal to 1 is significant because it is the most common occurrence in naturally occurring datasets. This phenomenon is known as Benford's Law and it has been observed in a wide range of fields, including finance, population statistics, and scientific data sets.

2. What is the probability of the first digit of a natural number being equal to 1?

The probability of the first digit of a natural number being equal to 1 is approximately 30.1%, according to Benford's Law. This means that in a large dataset, about 30.1% of the numbers will have a first digit of 1.

3. Why does Benford's Law apply to the first digit of natural numbers?

Benford's Law is a mathematical phenomenon that arises from the fact that naturally occurring numbers tend to follow a logarithmic distribution. This means that smaller numbers are more common than larger numbers, and the first digit of a number is more likely to be a 1 than any other digit.

4. How is Benford's Law used in real-world applications?

Benford's Law has been used in a variety of applications, including fraud detection, data analysis, and even forensic investigations. It can be used to identify anomalies or irregularities in datasets, such as numbers that have been manipulated or fabricated.

5. Are there any exceptions to Benford's Law?

While Benford's Law is generally true for large datasets, there are some exceptions. For example, if a dataset has been intentionally manipulated or if it follows a different distribution, the first digit may not follow Benford's Law. Additionally, certain numbers, such as zip codes or phone numbers, may not follow this pattern due to human intervention. However, in most cases, Benford's Law holds true and can be a useful tool for data analysis.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
832
  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • General Math
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
333
Replies
12
Views
734
  • Set Theory, Logic, Probability, Statistics
5
Replies
147
Views
7K
Back
Top