# Probability Distribution and Confidence Interval

## Homework Statement

Let X1, X2,....Xn be a random sample from the distribution with probability density function

fX (x;θ) = (θ+1)(1-x)θ, 0<x<1 θ>-1

a) What is the probability distribution of Y= -$\sum ln(1-Xi$ from i=1 -> n
b) Suggest a (1-α)100% confidence interval for θ based on Y= -$\sum ln(1-Xi$ from i=1 -> n

## The Attempt at a Solution

a) I began by transforming the equation.
Y= -Ʃ ln(1-Xi) from i=1 -> n= -ln((1-Xi)n
eY = 1/(1-Xi)n
Xi=1-e -Y/n

fy(y) = fx(g-1(y)) $\frac{dx}{dy}$
=$\frac{(θ+1}{n}$ e=$\frac{Y}{n}$(θ+1)

I don't think this is the correct answer. I'm not even sure if my math or the method to solving this is even correct.

b) I'm not even sure what they're asking for this part of the question. Could someone please clarify what I'm suppose to do after I find the probability distribution?

Related Calculus and Beyond Homework Help News on Phys.org
Stephen Tashi
eY = 1/(1-Xi)n
What is that notation supposed to mean? The $X_i$ are different sample values. You can't treat them as if they were all the same unknown quantity.

$$e^Y = ( \frac{1}{1-X_1})(\frac{1}{1-X_2})....(\frac{1}{1-X_n})$$

If we take a routine approach to this problem we would find the distribution of the random variable $h = -\ln ( 1 - X)$ and then compute the distribution of Y as the n-fold convolution of the distribution of $h$. But perhaps you are studying some topic that makes doing this problem easier. What definitions and theorems are explained in the chapter where this problem occurs?

Part b) Suggests that you can use Y to find an estimator of $\theta$. Have you studied what estimators are? After you find the estimator, you can worry about the confidence interval.

jambaugh
Gold Member
If n is sufficiently large to invoke the Central Limit Theorem then Y as a sum of many independent RV's can be assumed to be normally distributed. If this "shortcut" is valid then it is simply a matter of calculating the mean and st. dev of Y based on the mean and standard deviation of $-ln(1-X)$

The mean and variance should be calculable expectation values of functions of X.

I don't know if that's what is wanted in the problem.

What is that notation supposed to mean? The $X_i$ are different sample values. You can't treat them as if they were all the same unknown quantity.

$$e^Y = ( \frac{1}{1-X_1})(\frac{1}{1-X_2})....(\frac{1}{1-X_n})$$

If we take a routine approach to this problem we would find the distribution of the random variable $h = -\ln ( 1 - X)$ and then compute the distribution of Y as the n-fold convolution of the distribution of $h$. But perhaps you are studying some topic that makes doing this problem easier. What definitions and theorems are explained in the chapter where this problem occurs?

Part b) Suggests that you can use Y to find an estimator of $\theta$. Have you studied what estimators are? After you find the estimator, you can worry about the confidence interval.
We're basically doing the Central limit theorem and confidence intervals. I think we just have to do it the long way.

So I first solved for H=-ln(1-X) and I got 1-e-(θ+1)h I think that's an exponential with mean 1/θ+1

I'm assuming the next step would be to find the probability distribution of ƩHi from i=1 to n

How would I go about doing that? Can it look something like Y=Wi bar?

Stephen Tashi
Since you are studying the Central Limit Theorem, I think jimbaugh's approach is correct.
So you need to find the variance of H. Then approximate Y as a normal distribution.

jambaugh
Gold Member
If you use CLT...

Firstly I suggest you transform from X to T=1-X, (t = 1-x) and since |dt| = |dx| the pdf is unchanged except for the change of variable.

Call $W=-\ln(1-X) = -\ln(T)$
$$\mu_W = E[W] = -E[\ln(T)] =-\int \ln(t)f_T(t) dt$$

$\sigma^2_W = E[W^2] - \mu_W^2$

$$E[W^2] = \int_0^1 \ln(t)\ln(t)f_T(t)dt$$

Ugly integrals but integration by parts $dv = t^\theta dt$ will do it I believe.

Get mean and stdev of W.

Y = W1+W2+... + Wn.

$\mu_Y = \mu_W$, $\sigma_Y = \frac{ \sigma_W}{\sqrt{n}}$

You have Y's distribution assumed to be normal via C.L.T. with now known mean and st.dev.

Ray Vickson
Homework Helper
Dearly Missed

## Homework Statement

Let X1, X2,....Xn be a random sample from the distribution with probability density function

fX (x;θ) = (θ+1)(1-x)θ, 0<x<1 θ>-1

a) What is the probability distribution of Y= -$\sum ln(1-Xi$ from i=1 -> n
b) Suggest a (1-α)100% confidence interval for θ based on Y= -$\sum ln(1-Xi$ from i=1 -> n

## The Attempt at a Solution

a) I began by transforming the equation.
Y= -Ʃ ln(1-Xi) from i=1 -> n= -ln((1-Xi)n
eY = 1/(1-Xi)n
Xi=1-e -Y/n

fy(y) = fx(g-1(y)) $\frac{dx}{dy}$
=$\frac{(θ+1}{n}$ e=$\frac{Y}{n}$(θ+1)

I don't think this is the correct answer. I'm not even sure if my math or the method to solving this is even correct.

b) I'm not even sure what they're asking for this part of the question. Could someone please clarify what I'm suppose to do after I find the probability distribution?

I posted this once but it did not appear. Here it is again.

The random variable Z_i = -ln(1-X_i) have a simple, well-known distribution (although that fact may not, itself, be well known). The easiest way to get the distribution is to compute the Laplace transform E[exp(-s*Z_i)]. Once you have the -ln(1-X_i) distributions the rest is easy---or, at least, familiar and easily found in books, notes, etc.

RGV

If you use CLT...

Firstly I suggest you transform from X to T=1-X, (t = 1-x) and since |dt| = |dx| the pdf is unchanged except for the change of variable.

Call $W=-\ln(1-X) = -\ln(T)$
$$\mu_W = E[W] = -E[\ln(T)] =-\int \ln(t)f_T(t) dt$$

$\sigma^2_W = E[W^2] - \mu_W^2$

$$E[W^2] = \int_0^1 \ln(t)\ln(t)f_T(t)dt$$

Ugly integrals but integration by parts $dv = t^\theta dt$ will do it I believe.

Get mean and stdev of W.

Y = W1+W2+... + Wn.

$\mu_Y = \mu_W$, $\sigma_Y = \frac{ \sigma_W}{\sqrt{n}}$

You have Y's distribution assumed to be normal via C.L.T. with now known mean and st.dev.
I think we can do it some shorter way. I looked at some previous examples and Y=-Ʃln(1-Xi) has a Gamma distribution with α=2n and β=1/(θ+1)

However I'm stumped on how to get there. I know how to find the probability distribution of W=-ln(1-X), but I don't know what to do from there.

For part b, would I be able to do something like:
u=2n/(θ+1) var=2n/(θ+1)2

Zn= xbar - u / √var which converges in distribution to N(0,1)

P[-za/2 < Zn < za/2 ]
P[ xbar-za/2 √var < u < xbar + za/2√var ]

Ray Vickson
Homework Helper
Dearly Missed
I think we can do it some shorter way. I looked at some previous examples and Y=-Ʃln(1-Xi) has a Gamma distribution with α=2n and β=1/(θ+1)

However I'm stumped on how to get there. I know how to find the probability distribution of W=-ln(1-X), but I don't know what to do from there.

For part b, would I be able to do something like:
u=2n/(θ+1) var=2n/(θ+1)2

Zn= xbar - u / √var which converges in distribution to N(0,1)

P[-za/2 < Zn < za/2 ]
P[ xbar-za/2 √var < u < xbar + za/2√var ]
That is not how I would do it. First of all I would get the distribution of a single term -log(1-X) and find it to be exponential with some easily-computed rate parameter r related to theta. Then I would note that becauise Y is a sum of n exponential RVs, it is an n-Erlang random variable with parameters r and n (mean = n/r, variance = n/r^2); of course, this is a special case of a Gamma distribution, but is more convenient to work with. Finally, I would note that -log(1-X) = V/r, where V is exponentially distributed with rate 1 (mean = 1), so Y is (1/r)*W, where W is n-Erlang with rate 1--call it E(n,1). I would work out a 100a% probability interval for E(n,1), then use the fact that Y = (1/r)*E(n,1) to get a confidence interval for r, and hence for theta. That would work even if n is *not* large. Doing this is almost like converting to a standard N(0,1) distribution in problems involving normal distributions.

RGV