# EM Method for censored data - Statistical Inference

For censored data.

Random sample X1,...,Xn

Censored such that x1,...xm are observed but xm+1,...,xn are not - we just know they exceed T.

fx = exponential = theata exp(-theta.x)

L = ∏ (from 1 going to m) f(x;theta) ∏ (m+1 - n) 1 - F(T;theta)

Using F = int f I get

L = ∏∅exp(-∅x) ∏ exp(-∅T)

I can now work out the MLE but I want to use EM method.

Reading online I get that this censor (or right censor) would give E(X|X≥T) = T + 1/∅ and I get it but dont really know how to show it. Im not sure how to write the complete data likelihood or log-likelihood for this EM (im more used to mixed distributions or id just solve MLE).

I just dont really know how to set up the E step or M step. It should be quite trivial given what I know already but I just keep confusing myself with the whole

Q(∅,∅i) = E[l(∅;x1,...,xn)|∅i;x1,...,xm).

i have some intial data and then iterating using the M step should also be trivial, im just falling down at the one of the first hurdles.

Related Set Theory, Logic, Probability, Statistics News on Phys.org
...Reading online I get that this censor (or right censor) would give E(X|X≥T) = T + 1/∅ and I get it but dont really know how to show it. Im not sure how to write the complete data likelihood or log-likelihood for this EM (im more used to mixed distributions or id just solve MLE)...
Hint: write X_{m+1} = T + Y_{m+1} etc, where the Y_i are iid to X_1.

So I can just say E(X) = 1/theta (from 1 - m, as its distribution is exponential) and write X_m+1 - X_N as T + Y_m+1 - T + Y_n where Y_i are iid to X_i (or was X_1 right, I assumed it was a typo) and thus the expectation of the censored data is simply T + 1/theta.

If I solve as MLE I would have l=mlog(theta) - mthetax - (n-m)thetaT,
but in terms of EM how would I write down the data log likelihood (in this case I would treat all x1-xn as observed).

So I can just say E(X) = 1/theta (from 1 - m, as its distribution is exponential)
no (also it's better to use "to" or ".." instead of "-" to indicate ranges)
and write X_m+1 - X_N as T + Y_m+1 - T + Y_n where Y_i are iid to X_i (or was X_1 right, I assumed it was a typo)
oops, actually the Y_i are iid to none of the X_i (since X_1, ..., X_m are restricted to the range [0,T), you'll need to include a normalizing factor in the distribution). The Y_i are exponential because of the memoryless property
and thus the expectation of the censored data is simply T + 1/theta.
yes but you don't need this fact yet
If I solve as MLE I would have l=mlog(theta) - mthetax - (n-m)thetaT,
but in terms of EM how would I write down the data log likelihood (in this case I would treat all x1-xn as observed).
no the log-likelihood includes random variables because of the unobserved data - this is why the E step is done.

Right well Ill just call my complete data Z = (x1,...,xm,T)
where T=(xm+1,...,xn) are censored/unobserved.

Then my complete data log likelihood will just be:

l(x) = nlog(∅) - Ʃx∅ all sums go from n starting at i=1

Then given the memoryless property we have E[X|X>=T] = T + 1/∅ (which im still unsure of how to show)

I get my E step to be:

Q(∅,∅i) = nlog(∅) - ∅(ƩT + (n-m)∅i)

So my M step becomes:

∅i+1 = { ƩT + (n-m)∅i } / n

Last edited:
^^^ this is wrong, should still be sum of x, but should also involve T. initial guess:

∅i+1 = { Ʃx + T + (n-m)∅i } / n ???