Estimating n for 90% Confidence Interval of Dice Roll Summing to 117

Perrault
Messages
13
Reaction score
0

Homework Statement



(I translated this from french)

Some n dice are thrown and the sum of all their face values is 117.
Estimate n through a confidence interval of 90%.
In other words, of all the possible rolls of n dice that sum up to 117, find a range in which n should be located 90% of the time.
For example, 20\leqn\leq117, 100% of the time because a 117-dice roll can sum up to 117, but not a 118-dice roll, and 19 dice can give a maximum sum of 114.

Homework Equations



To estimate the population average (here, the population is every possible roll of n dice that sums to 117) while the variance is unknown, and X is normally distributed (which it probably is), we use the following formula
\frac{X^{―} - \mu}{\sqrt{\frac{S^{2}_{n-1}}{n}}<br /> } : T_{n-1}

Where :

X^{―} is the sample average if that has anything to do with anything.
S^{2} _{n-1} would be equal to \frac{n}{n-1} S^{2} where S^{2} would be the sample's variance and n the number of items in the sample.
T_{n-1} is the symbol for Student's t-distribution.

There are five other, closely related, formulae, but this is the one that seems the most reasonable to use.

But I'm not even sure how that can be used to estimate n.


The Attempt at a Solution



What has been shown above probably shows at which point I am lost in this affair. We have to use some distribution, but I'm not even sure my choice is right.

Thanks!
 
Physics news on Phys.org


Perrault said:

Homework Statement



(I translated this from french)

Some n dice are thrown and the sum of all their face values is 117.
Estimate n through a confidence interval of 90%.
In other words, of all the possible rolls of n dice that sum up to 117, find a range in which n should be located 90% of the time.
For example, 20\leqn\leq117, 100% of the time because a 117-dice roll can sum up to 117, but not a 118-dice roll, and 19 dice can give a maximum sum of 114.

Homework Equations



To estimate the population average (here, the population is every possible roll of n dice that sums to 117) while the variance is unknown, and X is normally distributed (which it probably is), we use the following formula
\frac{X^{―} - \mu}{\sqrt{\frac{S^{2}_{n-1}}{n}}<br /> } : T_{n-1}

Where :

X^{―} is the sample average if that has anything to do with anything.
S^{2} _{n-1} would be equal to \frac{n}{n-1} S^{2} where S^{2} would be the sample's variance and n the number of items in the sample.
T_{n-1} is the symbol for Student's t-distribution.

There are five other, closely related, formulae, but this is the one that seems the most reasonable to use.

But I'm not even sure how that can be used to estimate n.


The Attempt at a Solution



What has been shown above probably shows at which point I am lost in this affair. We have to use some distribution, but I'm not even sure my choice is right.

Thanks!

I don't think the student t-distribution has anything to do with this problem. We can regard N (the number of dice) as a random variable with some prior distribution, say uniform over some large interval. We toss N dice and observe a total = 117. We could use a computer algebra system to work out the probabilities that the sum of n dice is 117, for {N=n} between 20 and 117, but it is easier to work with a normal approximation.

Given n, we let f(x|n) = the normal density at x having mean m = (7/2) n and variance = (35/12)*n; these are the exact mean and variance of Sn = sum of n dice values. We can regard our observation as verifying {116.5 <= Sn <= 117.5}, and
P\{116.5 &lt;= S_n &lt;= 117.5 \} = \int_{116.5}^{117.5} f(x|n) \, dx \doteq f(117|n),
to good approximation. So, given the observation, we can regard N has having a posterior distribution
P(n) = \frac{ f(117|n)}{\sum_{k=20}^{117} f(117|k) }.
In this formula we use
f(117|n) = \frac{1}{\sqrt{2 \pi n}} \exp\left(-\frac{1}{2} \frac{(117 - (7/2) n)^2}{(35/12)n} \right).
You want to determine an interval I=\{n_1, n_1 + 1, \ldots, n_2 \} such that \sum_{n \in I} P(n) \geq 0.90, and presumably you would like I to be as short as possible, or nearly so.

RGV
 


Ray Vickson said:
P\{116.5 &lt;= S_n &lt;= 117.5 \} = \int_{116.5}^{117.5} f(x|n) \, dx \doteq f(117|n),
This probability isn't too hard to calculate exactly with a scripting language such as perl or python. Simply use the generating polynomial (x+x^2+x^3+x^4+x^5+x^6)/6. The probability of getting a sum of 117 with N rolls is the coefficient of x^{117} of the polynomial (x+x^2+x^3+x^4+x^5+x^6)^N/6^N.
 


D H said:
This probability isn't too hard to calculate exactly with a scripting language such as perl or python. Simply use the generating polynomial (x+x^2+x^3+x^4+x^5+x^6)/6. The probability of getting a sum of 117 with N rolls is the coefficient of x^{117} of the polynomial (x+x^2+x^3+x^4+x^5+x^6)^N/6^N.

Right, It is also straightforward in Maple---so one can find q(n) = P{Sn=117} for n from 20 to 117, and work with that array. However, the results are not very different from those obtained from the normal distribution. In particular, one gets the same confidence interval, but with very slightly different probabilities.

RGV
 


Hello, and thanks for your replies,

I have trouble understanding your work, I don't understand from here on :
Ray Vickson said:
We can regard our observation as verifying {116.5 <= Sn <= 117.5}, and
P\{116.5 &lt;= S_n &lt;= 117.5 \} = \int_{116.5}^{117.5} f(x|n) \, dx \doteq f(117|n),
to good approximation. So, given the observation, we can regard N has having a posterior distribution

If it's easier to use Maple, how would I do that?

Thanks again!
 


Perrault said:
Hello, and thanks for your replies,

I have trouble understanding your work, I don't understand from here on :

If it's easier to use Maple, how would I do that?

Thanks again!

For a given large n, say n = 80, we want to compute the probability that S80 = 117, which is the probability that the sum of 80 dice values equals 117. In principle we could use a computer algebra system to evaluate the probability exactly, but it is almost as good---and much easier---to use a normal approximation. So, S80 is a DISCRETE random variable taking values in the set {80, 81, 82, ... ,480}, and with mean ES80 = 80*3.5 and variance VS80 = (35/12)*80. We want to use the normal distribution with this same mean and variance, but the normal describes a CONTINUOUS random variable X, with P{X = 117} = 0. How can we make a continuous distribution approximate a discrete one? Well, would you not agree that for the discrete random variable S80, the two events {S80 = 117} and
{116.5 < S80 < 117.5} are exactly the same? (After all, S80 can only take integer values!) We cannot replace P{S80 = 17} by P{X = 117}, but we *can* replace P{116.5 < S80 < 117.5} by P{116.5 < X < 117.5}. In principle, we ought to use the exact interval probability (obtained by integrating the normal density f(x) from x = 116.5 to x = 117.5), but we can make a further approximation, and just replace the integral of f(x) over the interval by f(x) at the center of the interval (116.5,117.5), times the length (=1) of the interval; that is, the probability is approximately f(117)*1 = f(117). That is the form used in later calculations.

Note: there is nothing mysterious here: that is what we *always* do when we approximate a discrete probability by a continuous one.

I just used the formulas in my first response and evaluated them all in Maple. I had also done an exact analysis (first, before using the approximation), obtained by getting the exact discrete probabilities P{Sn = 117} for n from 20 to 117. Basically, the Maple commands were:
> f :=1/6*(x+x^2+x^3+x^4+x^5+x^6): #the generating function for 1 die
> #
> # the generating function for n dice is f^n, and the coefficient of x^117 is P{Sn=117}
> #
> for n from 20 to 117 do
> q[n]:=evalf(coeff(expand(f^n),x,117)): end do: n:='n':
> Tot:=add(q[n],n=20..117):
> for n from 20 to 117 do
> P[n]:=q[n]/Tot: end do: n:='n':
Note: it is probably possible to speed this up considerably because we don't really need coefficients for x^k with k > 117, so at each n we can truncate at x^117, then just multiply by f and truncate again, etc. Also, one can keep expression swell down by using evalf at each stage, so the coefficients of x^k are all floats. However, the direct method was fast enough, so I did not bother.

RGV
 
Last edited:


Perrault said:
I have trouble understanding your work, I don't understand from here on

What Ray did (and I would have approached this the same way) is to use Bayes' theorem,
P(A_i|E) = \frac{P(E|A_i)P(A_i)}{\sum_k P(E|A_k)P(A_k)}
where
  • {A_i} is a set of mutually exclusive events that collectively span the probability space. In this problem, the events are number of dice rolled N=1, N=2, ..., up to some rather large but finite number.
  • E is some observed event, or evidence. In this problem, the event E is the given fact that the sum of the N dice rolls was 117.
  • P(A_i|E)is the probability of event A_i given the observed event E. For example, what is the probability that the die was rolled 20 times to yield that total of 117? 21 times? These posterior probabilities are the desired quantities.
  • P(E|A_i) is the probability that the observed event E given the event A_i.
  • P(A_i) is some estimate of the probability of event A_i without that supporting evidence.
Without any prior supporting evidence, the principle of insufficient reason is about all one can go with: The priors are equiprobable. With this assumption of equal priors, Bayes' law reduces to
P(A_i|E) = \frac{P(E|A_i)}{\sum_k P(E|A_k)}

To illustrate, suppose you were told that the sum of the dice was seven. There are six possible values for N here, N=2 through 7. The probability of rolling seven with two dice (P(S=7|N=2)) is 6/36. Continuing, with this,
P(S=7|N=3)=15/216
P(S=7|N=4)=20/1296
P(S=7|N=5)=15/7776
P(S=7|N=6)=6/46656
P(S=7|N=7)=1/279936

These probabilities of course don't sum to one. There's no reason to expect them to do so. They instead sum to 70993/279936. This is in effect the normalization factor that let's us scale the posterior probabilities so they sum to one. With this scaling,

P(N=2|S=7)=0.6571915540968828
P(N=3|S=7)=0.2738298142070345
P(N=4|S=7)=0.06085106982378544
P(N=5|S=7)=0.00760638372797318
P(N=6|S=7)=0.0005070922485315454
P(N=7|S=7)=1.408589579254293e-05

So just N=2 and N=3 in this case alone give that 90% confidence interval (P=93.1%) in this case. This obviously is not the answer you want as there is no way to roll a sum of 117 with just 2 or 3 rolls of the dice.
 

Similar threads

Back
Top