# Proof that an interval is a confidence interval for Geom(q)

Tags:
1. Jan 28, 2016

### Alex_Doge

Hello Physicsforum
1. The problem statement, all variables and given/known data
I have a problem proving this:
Given $C(x)=[0, 3/x]$ for all $x\in\chi$, with $\chi=\Omega$ being the sample space and $P_q=Geom(q)$ being the geometric distribution.

I have to show that C(x) is a confidence Interval for q but I don't know how to get started.

I've been given the tip $P_q([0,3/q])=P_q(x\in[0,3/q])=P_q(\{1,2,\lfloor3/q\rfloor\})$ and then use the geometric series. It also says that the function wont be steady and that I should nest it between two steady ones.

2. Relevant equations
The definition of a confidence interval $P_q(u(X)<q<v(X))=\gamma$ for all $q\in(0,1]$ and $\gamma$ close to near 1.
Geometric summation formula and sigma additivity for disjoint sets.

3. The attempt at a solution
I tried using the definition but don't know how to continue. I think I have to prove the equalities:
$P_q(u(X)<q)=\gamma$ and $P_q(q<v(X))=\gamma$ but I don't know what I'm supposed to use for X. And I don't know what they mean with functions. I can't seem to see any dependency of a variable anywhere.
Any tips are very welcome!

Kind regards
Alex

2. Jan 28, 2016

### Ray Vickson

This is most definitely a calculus problem, so does not belong in the precalculus forum.

Here is how I would approach it. I would operate as a Bayesian, and suppose the Geometic parameter $q$ is governed by a prior distribution $f_0(q)$ for $0 <q<1$. Let $X = 1,2,3, \ldots$ be the Geometric random variable under observation. The probability of seeing $X = k$, given a value of $q$, is
$$P(X = k\,| \, q) = q\,(1-q)^{k-1}, \: k = 1,2,3, \ldots$$
The posterior probability density of $q$, given the observation $X = k$, is
$$f(q \,| \, k) = \frac{f_0(q) P(X=k\, | \, q)}{P(k)},$$
where
$$P(k) = \int_0^1 f_0(q)P(X = k\,| \, q) \, dq = \int_0^1 f_0(q) \, q\,(1-q)^{k-1} \, dq$$
Note that $P(k)$ is the prior probability of observing $X=k$.

Things become much easier if we use the so-called uninformative prior, which in this case means that $f_0(q) = 1$ is the uniform distribution on $(0,1)$; that is, we assume initially that $q$ is equally likely to take any value between 0 and 1. Basically, we know nothing at all about $q$, except that it must be between 0 and 1.

In this case we can do the integrals:
$$P(k) = \int_0^1 q (1-q)^{k-1} dq = \frac{1}{k(k+1)} ,$$
so the posterior probability density of $q$ is
$f(q|k) = k(k+1) q (1-q)^{k-1}, \; 0 < q < 1$.

You can now look at the interval $(0,3/k)$. Clearly, the probability that the (random quantity) $q$ lies in $(0,3/k)$ is 1 for $k = 1, 2, 3$. For $k \geq 4$ the probability that $q$ lies in $(0,3/k)$ is
$$P(0 < q < 3/k) = \int_0^{3/k} k(k+1) q (1-q)^{k-1} \, dq$$
You can evaluate this as a function of $k$ and plot it out for $k = 3, 4, 5, 6, \ldots$ to see if it is near 1 or not.

3. Jan 28, 2016

### Alex_Doge

First of all thanks for the detailed answer. Sorry I didn't know it would turn out to be a calculus problem.
The integral gives $P(0 < q < 3/k) = \int_0^{3/k} k(k+1) q (1-q)^{k-1} \, dq=\frac{12\cdot3^k k^{-k} (-1)^k(1-1/3k)^{k+1}+k-3}{k-3}$ It converges towards 0.8 if I'm not mistaken

What does this mean?

4. Jan 28, 2016

### Ray Vickson

Why are you plotting it for negative values of $k$? We need $k = 1,2,3,4, \ldots$, so plotting it for $k \geq 4$ has meaning. Negative values of $k$ have no meaning at all in this problem.

5. Jan 28, 2016

### Alex_Doge

Oh sorry, I'm getting tired, been stuck on this problem all day now.

That's a strange plot. What does this mean then? :)
Is it something similar to a delta function, or have I made a mistake plotting it?

6. Jan 28, 2016

### Ray Vickson

Part of your problem is that you have a result for your integral that seems to work for all $k$, but when you specify that $k$ is a positive integer, it simplifies a lot; in particular, the pesky factors $(-1)^k$ disappear, giving you a formula that works well for all positive values of $k \geq 3$ (no division by 0 anymore). Then it plots out nicely.

7. Jan 28, 2016

### Alex_Doge

Yea it looks better now:

But how do I continue from here?

8. Jan 28, 2016

### Alex_Doge

So because q lies in C with the probability 1, C is a confidence interval? For $k \geq 4$ the probability is not 1 for large k. Is that a problem?

9. Jan 29, 2016

### Ray Vickson

A confidence interval (with confidence $p \in (0,1)$) is an interval for which the probability is at least $p$ that it contains the unknown parameter of interest. So, if the parameter we want to estimate is $q$, we want an interval that has a probability of at least $p$ to overlap the unknown $q$.

In most problems there is not much difference between the Bayesian approach (with non-informative prior) that I outlined above, and the classical (non-Bayesian) confidence-interval method; the interpretations are different, but usually the computations are almost the same. However, that is not the case in your problem (because the alleged confidence interval is a bit unusual). So: the confidence-interval method will deliver different results in your problem.

In your case (without yet specifying $p$) the claim is that for observation $\{X=k\}$ the interval $(0,3/k)$ overlaps $q$ with a probability of $p$ or more. Note that the interval overlaps $q$ if and only if $q < 3/k$, so the probability is $P(k/3 > q) = P(k < 3/q)$. For a geometric random variable $X$ with parameter $q$ this probability is
$$P(X < 3/q) = \sum_{k=1}^{\lfloor 3/q \rfloor} q (1-q)^{k-1}$$
where $\lfloor u \rfloor$ is the greatest integer $\leq u$.

The problem is asking you to figure out a value of $p$ (hopefully, near 1.0) that is a lower bound on that probability (so that you can be at least $100 p\%$ sure the interval contains the true parameter value).

10. Jan 29, 2016

### Alex_Doge

I calculated that Probability $P(X < 3/q) = \sum_{k=1}^{\lfloor 3/q \rfloor} q (1-q)^{k-1} =q\sum_{k=1}^{\lfloor3/q\rfloor}(1-q)^{k-1}=(1-‌q)^{\lfloor3/q\rfloor}$How do I calculate this lower bound? Like this: $(1-‌q)^{\lfloor3/q\rfloor}=1$ and then solve for q?
Or do I minimize and maximize to find lower and upper bound?

Last edited: Jan 29, 2016
11. Jan 29, 2016

### Ray Vickson

Actually, $\sum_{k=1}^n q (1-q)^{k-1} = 1 - (1-q)^n$, NOT $(1-q)^n$.

You are not "solving for $q$"; you do not know the value of $q$, but want to know a value $\alpha$ (called $p$ before), such that
$$\sum_{k=1}^{\lfloor3/q\rfloor} q (1-q)^{k-1} \geq \alpha$$
for all $q \in (0,1)$. If it happens that $\alpha$ is "large" (near 1) then you have a useful $100 \alpha \%$ confidence interval.

12. Jan 29, 2016

### Alex_Doge

That means I have this:
$\sum_{k=1}^{\lfloor3/q\rfloor} q (1-q)^{k-1} =1-(1-q)^{\lfloor3/q\rfloor}\geq \alpha$ and need to find alpha.
Can I look at $1-(1-q)^{\lfloor3/q\rfloor}$ and see where it's minimum is, for $q \in (0,1)$, and then define alpha to be just lower? What do you mean by "large"?
Thanks for the help so far

13. Jan 29, 2016

### Alex_Doge

Thanks a lot for the help. I solved it now.

The red graph is the probability. The other two are the bounds of the floor function. The level is then 0.05.
The plots helped me understand it and gave the hint sense.