# Average Number of tracks

1. Jan 3, 2006

### Zurtex

Knowing I did mathematics my room mate posed me a question yesterday, he wanted to know given a number of tracks randomly played what is the average of a track being first repeated. He'd already written a program in Perl to solve this problem by brute calculation but was coming up with some very odd answers, he later discovered the random function wasn't very good.

Now, I was some what rusty on my probability skills and had to take this away to think about it. This did and still does sound like a very doable problem. This is how I started out:

Let X be the track number where the first repeat is, where the initial track played is track 0. Let x1 be the event where X=1, x2 be the event where X=2 etc.. To start off easy, let there be 3 tracks, then:

$$P(x_1) = \frac{1}{3}$$

It follows then that:

$$P(x_2 | {x_1}') = \frac{2}{3}$$

(In words, the probability of x2 occurring given that x1 has not occurred is 2/3). It therefore follows from some basic rules of probability:

$$\frac{P({x_1}' \cap x_2)}{P({x_1}')} = \frac{2}{3}$$

I then drew a little diagram to show that x2 was a subset of x1'. And therefore:

$$P(x_2) = \frac{4}{9}$$

I then worked out the probability of x3 the long way to make sure I was doing everything right, by saying that:

$$P(x_3|({x_2}' \cap {x_1}')) = 1$$

After expanding and reducing this nicely led to:

$$P(x_3) = \frac{2}{9}$$.

This gave a nice expectation of x as:

$$E(x) = \frac{17}{9}$$

I then set out looking at n tracks, which was pretty much the same:

$$P(x_1) = \frac{1}{n}$$

$$P(x_2 | {x_1}') = \frac{2}{n}$$

Which led to:

$$P(x_2) = \frac{2}{n} - \frac{2}{n^2}$$

I also worked out:

$$P(x_i | ( {x_{i-1}}' \cap \ldots \cap {x_1}')) = \frac{P(x_i)}{P({x_{i-1}}' \cap \ldots \cap {x_1}')}$$

$$P(x_i) = \frac{i}{n} \left(1 - \sum_{r=1}^{i-1} (P(x_r)) \right) \quad \text{with} \quad P(x_1) = \frac{1}{n}$$

This is far too complicated for me to deal with, so I attempted to put in to mathematica and it doesn't seem to be able to make anymore out of it that I can either. I feel I am going wrong somewhere because this doesn't seem like too difficult of a problem.

2. Jan 3, 2006

### shmoe

It looks like

$$P(x_i)=\frac{i}{n}\prod_{j=0}^{i-1}\frac{n-j}{n}=i\frac{n!}{n^{i+1}(n-i)!}$$

offhand I can't see a nice expression for the expected value, but you can find it easily enough for a given n.

3. Jan 3, 2006

### Hurkyl

Staff Emeritus
This sounds like the Birthday Problem.

The general rule of thumb is that out of N choices, it takes on the order of $\sqrt{N}$ picks to get a repeat. (Actually, one ought to multiply that by something to get a better estimate... probably something like $\sqrt{2}$, but I can't remember for sure)

The analysis on the Wikipedia page says the probability of getting no repeats after K picks is about:

$$\bar{P}(K) = e^{-\frac{K^2}{2 N}}$$

You're asking for the expected number of picks before getting a repeat. In other words:

$$\sum_{k = 1}^{N} k \bar{P}(k-1) \frac{k}{N}$$

because the odds that the first repeat happens at pick k is the product of the odds of no repeats by the (k-1)-th pick and the odds that the k-th pick is one of those previously chosen.

Well, I can do another approximation by pretending this is an integral and not a sum, and goes from 0 to infinity:

$$E \approx \int_0^{+\infty} \frac{1}{N} k^2 e^{-\frac{K^2}{2 N}} \, dx$$

if I've done my integration by parts and substitutions correctly, this yields:

$$E \approx \sqrt{2N} \int_0^{+\infty} e^{-x^2} \, dx = \sqrt{\frac{N \pi}{2}}$$

Last edited: Jan 3, 2006
4. Jan 3, 2006

### Zurtex

Cool, that's very nice, I did a little work from what shmoe gave me and got an exact expectation, which is given n tracks then:

$$E(x) = -1 + n e^n \text{ExpIntegralE}[-n,n]$$

Which if you look at definitions and stuff isn't too far out from your approximation.

5. Jan 4, 2006

### shmoe

As n goes to infinity, that's asymptotic to what Hurkyl provided.

That's quite alot of approximations in Hurkyl's post! the sums to integrals are fine, the integrand is increasing up to sqrt(2N) and decreasing thereafter with a max of 2/e at sqrt(2N), so this will be off by a constant at most.

What seems harder to justify is replacing the exact probability with $$\bar{P}(K)$$, which is a terrible approximation when K is close to N. However, it's an upper bound for the true probability (1+x<e^x) and it drops of very quickly, so the main contribution for the expected value sum will come from something like the first $$N^{1/2+\epsilon}$$ terms. This should handily show that the expected value sum in this range is asymptotic to the approximation.

6. Jan 4, 2006

### Hurkyl

Staff Emeritus
Ack, you don't want me to do some real work, do you? :uhh:

7. Jan 6, 2006

### balakrishnan_v

Let us find the fraction of the number line(from 0 to 1) that will be filled:
Now we will find E(1) which is nothing but the expected fraction of the line(x=0 to x=1) given that each filling width is a small h
(h=1/N)
E(1)=h+E(1-h)
E(1-h)=(1-h){h+E(1-2h)}
and so on
hence we get
E(1)=h{1+(1-h)+(1-h)(1-2h)+.,,}
Now since h is very small we can make (1-kh)->exp(-kh)
So the nth term->(exp(-(n^2+n)/2)->exp(-hn^2/2)
so the sum can be approxuimated to the integral
Note that kh<<1
Hence
$$\int_{0}^{\infty} e^{-\frac{hx^2}{2}}dx=\sqrt{\frac{\pi h}{2}}$$
The above is wrt to 0 to 1
So if we scale it up to N and noting that h=1/N since
$$\sqrt{\frac{\pi N}{2}}$$