EM Method for censored data - Statistical Inference

DKOli
Messages
8
Reaction score
0
For censored data.

Random sample X1,...,Xn

Censored such that x1,...xm are observed but xm+1,...,xn are not - we just know they exceed T.

fx = exponential = theata exp(-theta.x)



L = ∏ (from 1 going to m) f(x;theta) ∏ (m+1 - n) 1 - F(T;theta)

Using F = int f I get

L = ∏∅exp(-∅x) ∏ exp(-∅T)

I can now work out the MLE but I want to use EM method.

Reading online I get that this censor (or right censor) would give E(X|X≥T) = T + 1/∅ and I get it but don't really know how to show it. I am not sure how to write the complete data likelihood or log-likelihood for this EM (im more used to mixed distributions or id just solve MLE).

I just don't really know how to set up the E step or M step. It should be quite trivial given what I know already but I just keep confusing myself with the whole

Q(∅,∅i) = E[l(∅;x1,...,xn)|∅i;x1,...,xm).

i have some intial data and then iterating using the M step should also be trivial, I am just falling down at the one of the first hurdles.

Thanks in advance.
 
Physics news on Phys.org
DKOli said:
...Reading online I get that this censor (or right censor) would give E(X|X≥T) = T + 1/∅ and I get it but don't really know how to show it. I am not sure how to write the complete data likelihood or log-likelihood for this EM (im more used to mixed distributions or id just solve MLE)...

Hint: write X_{m+1} = T + Y_{m+1} etc, where the Y_i are iid to X_1.
 
So I can just say E(X) = 1/theta (from 1 - m, as its distribution is exponential) and write X_m+1 - X_N as T + Y_m+1 - T + Y_n where Y_i are iid to X_i (or was X_1 right, I assumed it was a typo) and thus the expectation of the censored data is simply T + 1/theta.

If I solve as MLE I would have l=mlog(theta) - mthetax - (n-m)thetaT,
but in terms of EM how would I write down the data log likelihood (in this case I would treat all x1-xn as observed).
 
DKOli said:
So I can just say E(X) = 1/theta (from 1 - m, as its distribution is exponential)
no (also it's better to use "to" or ".." instead of "-" to indicate ranges)
and write X_m+1 - X_N as T + Y_m+1 - T + Y_n where Y_i are iid to X_i (or was X_1 right, I assumed it was a typo)
oops, actually the Y_i are iid to none of the X_i (since X_1, ..., X_m are restricted to the range [0,T), you'll need to include a normalizing factor in the distribution). The Y_i are exponential because of the memoryless property
and thus the expectation of the censored data is simply T + 1/theta.
yes but you don't need this fact yet
If I solve as MLE I would have l=mlog(theta) - mthetax - (n-m)thetaT,
but in terms of EM how would I write down the data log likelihood (in this case I would treat all x1-xn as observed).
no the log-likelihood includes random variables because of the unobserved data - this is why the E step is done.
 
Right well Ill just call my complete data Z = (x1,...,xm,T)
where T=(xm+1,...,xn) are censored/unobserved.

Then my complete data log likelihood will just be:

l(x) = nlog(∅) - Ʃx∅ all sums go from n starting at i=1

Then given the memoryless property we have E[X|X>=T] = T + 1/∅ (which I am still unsure of how to show)

I get my E step to be:

Q(∅,∅i) = nlog(∅) - ∅(ƩT + (n-m)∅i)So my M step becomes:

∅i+1 = { ƩT + (n-m)∅i } / n
 
Last edited:
^^^ this is wrong, should still be sum of x, but should also involve T. initial guess:

∅i+1 = { Ʃx + T + (n-m)∅i } / n ?
 
Namaste & G'day Postulate: A strongly-knit team wins on average over a less knit one Fundamentals: - Two teams face off with 4 players each - A polo team consists of players that each have assigned to them a measure of their ability (called a "Handicap" - 10 is highest, -2 lowest) I attempted to measure close-knitness of a team in terms of standard deviation (SD) of handicaps of the players. Failure: It turns out that, more often than, a team with a higher SD wins. In my language, that...
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Back
Top