Understanding Benford's Law Proof: Scaling and Invariance

etotheipi · May 4, 2020

The significant digits of numbers in sets of numerical data supposedly follows "Benford's Law", which asserts that the probability that the first digit in a given data point is ##D## is about ##\log_{10}(1+ \frac{1}{D})##. An upshot is that we expect ~30% of significant digits to be ##1##.

The proof is outlined here and I can follow their reasoning but can't understand the very first step. They say

Benford's law applies to data that are not dimensionless, so the numerical values of the data depend on the units. If there exists a universal probability distribution ##P(x)## over such numbers, then it must be invariant under a change of scale, so ##P(kx) = f(k)P(x)##

If you take that to be true you can show ##f(k) = \frac{1}{k}##, though I wondered how you come up with the above assertion in the first place? What do we mean by scaling - I thought ##P(x)## was just supposed to model a PDF over the digits from 1 to 9?

.Scott · May 4, 2020

He is saying that Benford's law is true when the numbers being considered have a "universal probability distribution" which he then defines as being invariant to the units of measure.

So Benford's Law will work when you have the same distribution of numbers regardless of the units of measure - for example: feet, miles, cm, meters, etc.

etotheipi · May 4, 2020

.Scott said:

He is saying that Benford's law is true when the numbers being considered have a "universal probability distribution" which he then defines as being invariant to the units of measure.

Though if the domain of ##P(x)## is digits from 1-9, then what does it mean to consider ##P(kx)##?

I can imagine that if you had a pdf ##f_X(x)## whose domain was ##x \in [0,5]##, if you then considered something like ##Y = 5X## (i.e. converting into a weird new unit), then the domain of this new ##f_Y(y)## is ##y \in [0,25]##, and to normalise it we would squish the curve down by a factor of 5.

I guess I wonder what do they mean by 'such numbers'; I had assumed this was only over the significant digits.

BvU · May 4, 2020

data that are not dimensionless come about by dividing some obervation by a unit: what you report when you write e.g. length = 3.4 m is actually length / 1 m = 3.4 and there is your 1/k !

BvU · May 4, 2020

Another approach: from 1 to the next digit up is ~~50%~~ {oops...

) 100 % but to the next digit down is only 10 %

.Scott · May 4, 2020

etotheipi said:

Though if the domain of ##P(x)## is digits from 1-9, then what does it mean to consider ##P(kx)##?

I can imagine that if you had a pdf ##f_X(x)## whose domain was ##x \in [0,5]##, if you then considered something like ##Y = 5X## (i.e. converting into a weird new unit), then the domain of this new ##f_Y(y)## is ##y \in [0,25]##, and to normalise it we would squish the curve down by a factor of 5.

I guess I wonder what do they mean by 'such numbers'; I had assumed this was only over the significant digits.

The ##x## in ##P(x)## are all possible measurements. That ##x## is not 1,2,3,4,5,6,7,8,9.
Later in the proof, he uses ##P(D)## - where ##D## is one of the nine decimal digits.

etotheipi · May 4, 2020

BvU said:

data that are not dimensionless come about by dividing some obervation by a unit: what you report when you write e.g. length = 3.4 m is actually length / 1 m = 3.4 and there is your 1/k !

I see; the dimensions are the key part. Thank you!

etotheipi · May 4, 2020

.Scott said:

The ##x## in ##P(x)## are all possible measurements. That ##x## is not 1,2,3,4,5,6,7,8,9.
Later in the proof, he uses ##P(D)## - where ##D## is one of the nine decimal digits.

That makes more sense. Thanks for clarifying!

My "homework" (well, I guess isn't all work "homework" now...?) is supposedly to trawl through a bunch of newspapers to see if the distribution fits... but the maths behind it is much more interesting!

PeroK · May 4, 2020

etotheipi said:

I see; the dimensions are the key part. Thank you!

I must admit I didn't know there was such a law. I thought it was just fairly obvious that numbers tend to start with ##1##. For example, if you take the price of everything in the supermarket. Then you apply inflation - it doesn't matter what percentage you use.

If something starts at £1, it takes a long time to get to £2, less time to get to £3 and so on. With inflation at 3%, say, the price spends 24 years at £1. something; and only 4 years at £9. something. And then the same cycle for £10. something etc. By a rough calculation, therefore, the price of 27% of all items should start with ##1##.

It doesn't apply to phone numbers, for example, as these are codes and not actually numbers.

etotheipi · May 4, 2020

PeroK said:

I must admit I didn't know there was such a law. I thought it was just fairly obvious that numbers tend to start with ##1##. For example, if you take the price of everything in the supermarket. Then you apply inflation - it doesn't matter what percentage you use.

At first I thought it was slightly outlandish, but yes when I thought about it for a little longer it doesn't seem so far fetched. I quite like your inflation example since it's easy to see how in a subsequent fixed period of time you're going to get greater increases which will push you through the £70's, £80's, etc. into the £100's, and then you spend a bit longer there whilst the rate of raw increase keeps increasing throughout the other hundreds etc.

Some of the other things on that list are a bit harder to visualise (e.g. area of rivers, X-ray volts?) but I suppose the same principle applies.

Cool stuff!

PeroK · May 4, 2020

PS On a topical, if grim, note: in 37% of countries the number of Coronavirus cases begins with 1. And another 18% begin with 2. You can see the same "inflationary" pattern in the numbers.

Klystron · May 4, 2020

I encountered Benford's law, though probably not a proof, in several contexts including designing pseudo-random number generators to model actual data sets. This quote contains a useful distinction [added Latex formatting]:

The quantity ##P(d)## is proportional to the space between ##d## and ##d+1## on a logarithmic scale. Therefore, this is the distribution expected if the logarithms of the numbers (but not the numbers themselves) are uniformly and randomly distributed.

When slide rules were in general use and, therefor, logarithmic numeric forms more pervasive; Benford's distribution in otherwise random data sets would have been more apparent.

etotheipi · May 4, 2020

Klystron said:

When slide rules were in general use and, therefor, logarithmic numeric forms more pervasive; Benford's distribution in otherwise random data sets would have been more apparent.

That's a bit like the part in the Wolfram article where it mentions the first few pages in tables of logarithms were observed to be more worn than those later on

Klystron said:

I encountered Benford's law, though probably not a proof, in several contexts including designing pseudo-random number generators to model actual data sets.

I know very little about how random number generators work (I only really know of the iterative approach with the seed and the remainders... not sure if it has a name!); if you were designing a pseudo-random number generator whose aim was to generate numbers with a uniform distribution between 0 and 100,000, I wouldn't suspect that Benford's law be obeyed since you're imposing an arbitrary cut-off in possible values. Is that right?

I wonder, how do you get the algorithm to spit out numbers that conform to the law? Was such an alteration necessary for what you were doing?

Klystron · May 4, 2020

etotheipi said:

That's a bit like the part in the Wolfram article where it mentions the first few pages in tables of logarithms were observed to be more worn than those later on
I know very little about how random number generators work (I only really know of the iterative approach with the seed and the remainders... not sure if it has a name!); if you were designing a pseudo-random number generator whose aim was to generate numbers with a uniform distribution between 0 and 100,000, I wouldn't suspect that Benford's law be obeyed since you're imposing an arbitrary cut-off in possible values. Is that right?

I wonder, how do you get the algorithm to spit out numbers that conform to the law? Was such an alteration necessary for what you were doing?

My memory seems cloudy of late but one application involved a seeded RNG that produced numbers between zero and one that met the requirements for randomness. The application programmers used simple arithmetic to produce digits from 1...10 to produce CFD test data but the requestor was not satisfied with the output.

I applied Benford's distribution to weight the generated data (digits) to more accurately mimic actual fluids, similar to this table:

A colleague really improved what she called my "brutal force" methods by subtle reiteration of the weighting algorithm to simulate more natural data streams.

I later applied similar algorithms to a group artificial intelligence project at university that required simulated shoppers entering and leaving queues to initiate the simulation. Previous data sets produced unnatural predictable bunches of shoppers. Applying Benford's Law lent verisimilitude to the sim that led to a successful project.

etotheipi · May 4, 2020

Klystron said:

I later applied similar algorithms to a group artificial intelligence project at university that required simulated shoppers entering and leaving queues to initiate the simulation. Previous data sets produced unnatural predictable bunches of shoppers. Applying Benford's Law lent verisimilitude to the sim that led to a successful project.

That's very cool, how such a peculiar mathematical quirk can change the results so drastically. Thanks for sharing!

WWGD · May 5, 2020

On a separate note, wonder if Benford's law relates to Zipf's law, on distribution of proportion of traits in a population where e.g., the 2nd, 3rd, etc largest cities in a country will have proportions of the total population that remain constant across different populations. At any rate, maybe s good point to make is that Benford's is a Practical Statistical but not Mathematical law.

pbuk · May 11, 2020

WWGD said:

On a separate note, wonder if Benford's law relates to Zipf's law, on distribution of proportion of traits in a population where e.g., the 2nd, 3rd, etc largest cities in a country will have proportions of the total population that remain constant across different populations.

No, Zipf's law is an empirical law - it is derived from the observation of different data sets, some of which obey it and some of which don't. Benford's law on the other hand is deterministic - oversimplifying a little, if you have data set with a distribution that is scale invariant then it can be shown (Hill 1998) that for example the first digits of the data set will follow Benford's law.

WWGD said:

At any rate, maybe s good point to make is that Benford's is a Practical Statistical but not Mathematical law.

Not sure what your distinction is there, but I would consider it a 'Mathematical Law' in a way that Zipf's law, the Pareto principle etc. are not, although even Theodore Hill in his original paper and in subsequent publications does not use the word 'proof', and nor would I. The distinction that makes it mathematical is that the explanation for Zipf and Pareto lies in socio-economic or other factors intrinsic to the data that is being studied whereas the explanation for Benford's law lies in the mathematical properties of the numbers used to measure the data.

WWGD · May 11, 2020

pbuk said:

No, Zipf's law is an empirical law - it is derived from the observation of different data sets, some of which obey it and some of which don't. Benford's law on the other hand is deterministic - oversimplifying a little, if you have data set with a distribution that is scale invariant then it can be shown (Hill 1998) that for example the first digits of the data set will follow Benford's law.Not sure what your distinction is there, but I would consider it a 'Mathematical Law' in a way that Zipf's law, the Pareto principle etc. are not, although even Theodore Hill in his original paper and in subsequent publications does not use the word 'proof', and nor would I. The distinction that makes it mathematical is that the explanation for Zipf and Pareto lies in socio-economic or other factors intrinsic to the data that is being studied whereas the explanation for Benford's law lies in the mathematical properties of the numbers used to measure the data.

I am not an expert on it but by checking basic sources like Wiki and Wolfram, it is described as something that tends to occur in some datasets. So I would not call it Mathematical unless that statement was unsubstantiated.

pbuk · May 11, 2020

WWGD said:

I am not an expert on it but by checking basic sources like Wiki and Wolfram, it is described as something that tends to occur in some datasets. So I would not call it Mathematical unless that statement was unsubstantiated.

I don't consider Wikipedia authoritative. Be careful with the word 'tends', you seem to be using it to mean 'has a tendency to' whereas when Wolfram say 'Benford's law states that in listings, tables of statistics, etc., the digit 1 tends to occur with probability ~30% much greater than the expected 11.1% (i.e., one digit out of 9)' they are using it to mean 'in the limit approaches'.

But in any case I am not really concerned with linguistic definitions

if you don't want to use the term mathematical to describe this phenomenon then don't.

Understanding Benford's Law Proof: Scaling and Invariance

What is Benford's Law and why is it important to understand?

What is the proof of Benford's Law and how does it relate to scaling and invariance?

How can Benford's Law be applied in real-world scenarios?

Are there any limitations to Benford's Law?

What are some common misconceptions about Benford's Law?

Similar threads

Hot Threads

Recent Insights