# Probability of getting a seat in the train car

Tags:
1. Jan 13, 2017

### bznm

1. The problem statement, all variables and given/known data
A train has got five train cars, each one with N seats. There are 150 passengers who randomly choose one of the cars. What is the probability that everyone will get a seat?

I think that what is asking me is "what is the probability that each wagon is chosen by no more than N passengers"?

The probability of having $$n<N$$ people in *a* wagon is given by
$$p_N=\sum\limits_{n=0}^N \binom{150}{n}0.2^n 0.8^{150-n}$$.

I thought of the answer being $$p_N^5$$.. but I think that the events

- wagon a is chosen by no more than N passengers
- wagon b is chosen by no more than N passengers
- wagon c is chosen by no more than N passengers
- wagon d is chosen by no more than N passengers
- wagon e is chosen by no more than N passengers

are far from being independent... So what could I do?
Thanks a lot

2. Jan 13, 2017

### haruspex

I suggest that you are not expected to get an answer in closed form. It should suffice to write it as a sum. If n1 choose the first wagon, how many are left for the next?

3. Jan 13, 2017

### bznm

Thanks a lot for the reply!
I thought of writing the number of interesting combinations as

$$\binom{150}{n_1} \binom{150-n_1}{n_2} \binom{150-n_1-n_2}{n_3} \binom{150-n_1-n_2-n_3}{n_4} \binom{150-n_1-n_2-n_3 -n_4}{n_5}$$
where n_1, n_2, n_3, n_4, n_5 < N and n_1 + n_2 +n_3 + n_4 + n_5=150
However I believe that the exercise does want me to find a closed form. In fact, it also asks which N implies probability > 0.9...

4. Jan 13, 2017

### haruspex

Consider a simple example. Two wagons of capacity 100. We have $\Sigma_{50}^{100}$ $^{150}C_r2^{-150}$, i.e. a truncated sum of the binomial expansion. Isn't that known not to have closed form?
For the numeric part of the question, maybe you can use an approximation.

5. Jan 13, 2017

### bznm

Mhh how could I approximate? I mean, I know that binomial can be approximated with normals.. but here I have a sort of hypergeometric multivariate distribution, isn't it?
I think this exercise exceed my course level.. I don't know how to proceed :(

6. Jan 13, 2017

### Ray Vickson

Be careful to distinguish between $n < N$ and $n \leq N$. Which one do you want in this problem?

And no, you do not have a multivariate hypergeometric distribution, because you are not "sampling without replacement". In this problem each of the 150 passengers randomly chooses a car, independently of what any of the other passengers do. So, you have a multivariate distribution alright, but it is a so-called "multinomial distribution" (which is an extension of the binomial to more than two categories). So, in your problem the probability of $k_i$ passengers choosing cars $i = 1,2,3$ (and thus $k_4 \equiv 150-k_1-k_2-k_3$ choosing car 4) is
$$P(k_1,k_2,k_3) = \frac{150!}{k_1! k_2! k_3! (150-k_1-k_2-k_3)!} \left(\frac{1}{4}\right)^{150}.$$
Assuming that $N < 150$, we want $\sum_{(k_1,k_2,k_3) \in S} P(k_1,k_2,k_3)$, where $S$ is the set
$$S = \{ (k_1,k_2,k_3): 0 \leq k_i \leq N, \; i = 1,2,3, \; \text{and} \;150-k_1-k_2-k_3 \leq N \}.$$

One can approximate a multinomial distribution by a multinormal distribution with the same mean vector and variance-covariance matrix.

7. Jan 14, 2017

### bznm

Thanks for the correction! The fact that if passenger chooses the first wagon, then it doesn't choose the others made me think about "sampling without replacement"...

8. Jan 15, 2017

### haruspex

For what it's worth, I slogged through an approximation using Xi as the actual number in the ith car minus the average number (i.e. 30). I got the probability that all are seated as $K\Sigma _{\max\left[x_i\right]<N-30}e^{-\Sigma x_i^2/60}$, where K is the probability of exactly 30 in each. Cross-checked with a spreadsheet and it seems fairly accurate.

This suggests a geometric approach involving n-dimensional spherical shells.

9. Jan 15, 2017

### Ray Vickson

Sorry: my response took the number of cars to be 4, not 5. For 5 cars the expression is similar.

10. Jan 16, 2017

### haruspex

I looked further into the geometric approach. It works in principle, but gets tangled up with integrals over spherical end caps (and even intersections of such) in five dimensions.

If we pretend naïvely that the occupancies are independent, we can get a lower bound. It's a lower bound because having chosen those distributions in which everyone in the first carriage gets a seat, the expected occupancy of that carriage is below 30. That reduces the probability that other carriages will seat all their customers. Anyway, I think you'll find this bound is about 40.
To remove the bias, we can put a lower limit on the occupancy of the first carriage, i.e. require its occupancy to be from A to R. If that keeps its average occupancy at or above 30, we can claim not to have put upward pressure on the remaining carriages. (I have a sort of argument that says this should produce an upper bound on R.)
So what we need to find is a pair of numbers, A and R, such that
P(occupancy of first carriage is in [A, R]) > 0.90.2, and
E[occupancy of first carriage given that occupancy range] > 30, and
R is minimised.

With the aid of an online calculator for cumulative probabilities of a binomial distribution, that is not too hard. Or we could use a Gaussian approximation.

Last edited: Jan 16, 2017