EM Method for censored data - Statistical Inference

AI Thread Summary
The discussion focuses on applying the EM method for statistical inference with censored data, specifically using an exponential distribution. The likelihood function is established, and the participant expresses confusion about setting up the E and M steps for the EM algorithm. They derive the expected value of the censored data as E(X|X≥T) = T + 1/θ, but struggle with formulating the complete data log-likelihood. Clarifications are made regarding the treatment of observed and unobserved data, emphasizing the need for a normalizing factor due to the memoryless property of the exponential distribution. The conversation highlights the challenges of transitioning from maximum likelihood estimation to the EM framework.
DKOli
Messages
8
Reaction score
0
For censored data.

Random sample X1,...,Xn

Censored such that x1,...xm are observed but xm+1,...,xn are not - we just know they exceed T.

fx = exponential = theata exp(-theta.x)



L = ∏ (from 1 going to m) f(x;theta) ∏ (m+1 - n) 1 - F(T;theta)

Using F = int f I get

L = ∏∅exp(-∅x) ∏ exp(-∅T)

I can now work out the MLE but I want to use EM method.

Reading online I get that this censor (or right censor) would give E(X|X≥T) = T + 1/∅ and I get it but don't really know how to show it. I am not sure how to write the complete data likelihood or log-likelihood for this EM (im more used to mixed distributions or id just solve MLE).

I just don't really know how to set up the E step or M step. It should be quite trivial given what I know already but I just keep confusing myself with the whole

Q(∅,∅i) = E[l(∅;x1,...,xn)|∅i;x1,...,xm).

i have some intial data and then iterating using the M step should also be trivial, I am just falling down at the one of the first hurdles.

Thanks in advance.
 
Physics news on Phys.org
DKOli said:
...Reading online I get that this censor (or right censor) would give E(X|X≥T) = T + 1/∅ and I get it but don't really know how to show it. I am not sure how to write the complete data likelihood or log-likelihood for this EM (im more used to mixed distributions or id just solve MLE)...

Hint: write X_{m+1} = T + Y_{m+1} etc, where the Y_i are iid to X_1.
 
So I can just say E(X) = 1/theta (from 1 - m, as its distribution is exponential) and write X_m+1 - X_N as T + Y_m+1 - T + Y_n where Y_i are iid to X_i (or was X_1 right, I assumed it was a typo) and thus the expectation of the censored data is simply T + 1/theta.

If I solve as MLE I would have l=mlog(theta) - mthetax - (n-m)thetaT,
but in terms of EM how would I write down the data log likelihood (in this case I would treat all x1-xn as observed).
 
DKOli said:
So I can just say E(X) = 1/theta (from 1 - m, as its distribution is exponential)
no (also it's better to use "to" or ".." instead of "-" to indicate ranges)
and write X_m+1 - X_N as T + Y_m+1 - T + Y_n where Y_i are iid to X_i (or was X_1 right, I assumed it was a typo)
oops, actually the Y_i are iid to none of the X_i (since X_1, ..., X_m are restricted to the range [0,T), you'll need to include a normalizing factor in the distribution). The Y_i are exponential because of the memoryless property
and thus the expectation of the censored data is simply T + 1/theta.
yes but you don't need this fact yet
If I solve as MLE I would have l=mlog(theta) - mthetax - (n-m)thetaT,
but in terms of EM how would I write down the data log likelihood (in this case I would treat all x1-xn as observed).
no the log-likelihood includes random variables because of the unobserved data - this is why the E step is done.
 
Right well Ill just call my complete data Z = (x1,...,xm,T)
where T=(xm+1,...,xn) are censored/unobserved.

Then my complete data log likelihood will just be:

l(x) = nlog(∅) - Ʃx∅ all sums go from n starting at i=1

Then given the memoryless property we have E[X|X>=T] = T + 1/∅ (which I am still unsure of how to show)

I get my E step to be:

Q(∅,∅i) = nlog(∅) - ∅(ƩT + (n-m)∅i)So my M step becomes:

∅i+1 = { ƩT + (n-m)∅i } / n
 
Last edited:
^^^ this is wrong, should still be sum of x, but should also involve T. initial guess:

∅i+1 = { Ʃx + T + (n-m)∅i } / n ?
 
I was reading documentation about the soundness and completeness of logic formal systems. Consider the following $$\vdash_S \phi$$ where ##S## is the proof-system making part the formal system and ##\phi## is a wff (well formed formula) of the formal language. Note the blank on left of the turnstile symbol ##\vdash_S##, as far as I can tell it actually represents the empty set. So what does it mean ? I guess it actually means ##\phi## is a theorem of the formal system, i.e. there is a...
Back
Top