# What's the odds

1. Nov 14, 2007

### JustRandy

What are the odds of any one person having a street address where the digits sum to 11? I figure a couple of asumptions have to be made like, the number of digits in an address fit in a normally distributed curve where the mean is around 3 or 4 digits, and everyone is equally likely to have any one of the 10 digits in their address, and the number 0 can't begin the address. I used to be pretty sharp at math in school, but I have no idea how to deal with this one. I'd appreciate any help. Randy.

2. Nov 14, 2007

### robert Ihnot

This is a real cup of worms, I am sure. One complicating factor is called Benford's Law, which assumes that the leading digit has a frequency to the base 10 of log(1+1/D). This means that 1 is the leading digit around 30% of the time.

I don't doubt if you went up and down streets you might find some truth in this.

One reason for that is if we have 999 items, the leading digit is equally distributed, but if we have 1999 things, believe it or not, more than half of them started with a 1.

Last edited: Nov 14, 2007
3. Nov 14, 2007

### JustRandy

Oh yes, good point... I didn't see that. Now that I think about it, there are other complicating factors like a/b addresses and odds n evens being on the north or south side of the street. What if we just assume the leading digit is eqaully distributed? If we have 9999 things, wouldn't the leading digit be equally distributed? How do we proceed then? Thanks.

4. Nov 14, 2007

### robert Ihnot

Well if all digits were equally distributed of a 9999 cases, the total sum of all the numbers would be 999x10000/2 =49995, the sum of the digits being then 9. But, since we are just talking of the leading digit then: 1 +2+++9=9x10/2 =45==9. Now if we had a 100 such cases of each digit then 100x45=4500 sums again to 9.

Last edited: Nov 15, 2007
5. Nov 15, 2007

### uart

That's for sure. The problem with a question like this is that the answer you get will depend totally on the assumptions you make. For example I doubt that the number of digits in the street number will be normally distributed, but even it were you'd still need accurate data on the mean and standard deviation which you could only really get by sampling. Overall I think it would be best to simply take a random sample of about 1000 addresses from the phone book and just go with the relative frequency.

The answer will also depend upon how "random" is the street selection, as I assume that the distribution of "street lengths" (which I'll define as the number of houses in a particular street) will depend on what locality you're choosing from. Is it a random street in the US or is it chosen randomly from any country/region/locality in the world, these considerations will impact on the street length distribution.

BTW. Just for fun I made up a "guestimate" of the street length probability distribution for my locality and from that I estimated a chance of about 1 in 13 of a randomly chosen street number having digits adding to 11. I then picked up the phone book and chose 100 addresses at random, finding only 4 with street numbers adding to 11. This gives an estimate of 1 in 25 which hardly backs up my calculated value, though the sample size of 100 was undoubtedly too small for really meaningful results.

6. Nov 15, 2007

### uart

If you’re interested here’s the data I used for my guestimate of 1 in 13.

Code (Text):

Number of houses in street --- % streets in this category --- rel freq of sum=11
001..010 --- 05% --- 0
011..020 --- 08% --- 0
021..050 --- 15% --- 3/30
051..100 --- 22% --- 5/50
101..200 --- 15% --- 9/100
201..300 --- 12% --- 10/100
300..400 --- 08% --- 9/100
400..500 --- 05% --- 8/100
500..1000 ---10% --- 25/500

The percentage of street numbers with a digit sum of 11 (about 8%) is equal to the sum of the product of column2 by column3.

The problem with this of course is that the second column of the data was drawn out of thin air. :).

Last edited: Nov 15, 2007
7. Nov 15, 2007

### JustRandy

Boy,,, I'm really burning the neuro-glucose on this one just trying to understand what you guys are talking about. And coming from a background of being handed math award after award, 8 courses of calculus and an engineering degree.... I'm feeling pretty stupid about now. However, being not so much smart as determined, I will get to the bottom of this even if I have to order mailing lists and paste addresses into excel to find the answer ;)

Thank you very much for the help! I will be back!

8. Nov 16, 2007

### JustRandy

Ok, here's what I've done. I used excel to find the number of times you could draw a number that sums to 11 from a lot of 2,3,4,5 digit numbers. For 2 its 8/90 or 9%. 3 its 61/900 or 7%. 4 is 279/9000 or 3%. 5 is 991/90,000 or 1%. I assume we can just add those numbers together and get 1339/99,990 or 1%.

In excel I started a, say, 5 digit number sequence with 10,000 and filled down to 99,999. Then split each digit into its own column and summed the rows. Then I sorted the sum and counted the 11's. All this sound right?

After a quick look in the phone book I've decided to just drop 2 and 5 digit address. We are so much more likely to get a 3 or 4 digit address that all the others would seem to have an insignificant impact. So, do you assume we are equally likely to get a 3 or 4 digit address (it appears so in the phone book)? And, how do we account for the fact that we are more likely to get a starting digit that is a low number? A starting digit of 9 is rarely seen and might also be able to be dropped. Should I just take a sample and develop an index for use back in excel? Or is there an equation or statistical curve to follow?

9. Nov 16, 2007

### uart

Well that's why I was trying to point out the significance of the geographical location on the distribution of street lengths. I'm amazed that you have hardly any 2 digit street adresses in the geographic location under consideration. In my phone book the one and two digit street addresses way outnumber the longer ones. In my case it is a suburban location (in Australia) and in most suburbs there are loads of fairly short Streets and Closes and Laneways with abnout 30 streets address or less in each.

Since this is essentually heading towards experimental probability why dont you just pick up the phone book and start computing relative frequencies (of street addresses with digit sum of 11). If you've got a scanner with OCR (optical character recognition) or access to a phone book on CD then it should be easy enough to get a few thousand data points.

10. Nov 16, 2007

### CRGreathouse

I agree that geography matters a lot -- my address is 5 digits, so 3 seems unusually short to me.

11. Nov 16, 2007

### robert Ihnot

It is not that hard to go at this by simple counting. With one digit numbers, there are none that add to 11. With two digit numbers, simple counting finds: 9,2;, 8,3; 7,4;, 6,5. Each of these can be done in two ways which gives for the first 99 numbers (two zeros does not count) $$\frac{8}{99} = 8.1%$$

To take numbers in the one hundreds, all we do is figure the two digit reasoning over again, but this time also add the two digits that sum to 10. The ratio now becomes:
(8+9)/199 = 8.5%.

Continuing on this way to the 200s, we get this set:

(8+9+10)/299 = 9.03%

Then it begins to slowly level off, the next set being: (8+9+10+9)/399=9.02%.

By the time we reach 999, the ratio has gone to, I figure, 69/999 = 6.9%

If we now add 1000 ones to this at the far right, the figure goes to (69+66)/1999 = 6.75% (But, please remember that these larger figures have not been carefully gone over a second time.)

Not much of a drop, but a slight one, and does show the question of the leading digit being a one is not so important as Benford's law might suggest. Through the 2000s, I arrived at: (69+66+45)/2999 = 6.0%

If we go back to look at those first two digits (from right to left) supposed to be simply normal, you can see that, since the average digit has value of 4.5, this would give 9 as the most likely sum. So in the 200s, we are working with the most favorable situation for the ratio.
Obviously then as numbers get larger and larger the ratio of the addition to 11 decreases indefinitely.

So assuming this digit distribution to be simply normal, which is not really true, especially on business streets where numbers are frequently skipped; we have a pretty good "guesstimate" of what to expect.

Last edited: Nov 17, 2007