Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

EM Method for censored data - Statistical Inference

  1. Dec 16, 2011 #1
    For censored data.

    Random sample X1,...,Xn

    Censored such that x1,...xm are observed but xm+1,...,xn are not - we just know they exceed T.

    fx = exponential = theata exp(-theta.x)

    L = ∏ (from 1 going to m) f(x;theta) ∏ (m+1 - n) 1 - F(T;theta)

    Using F = int f I get

    L = ∏∅exp(-∅x) ∏ exp(-∅T)

    I can now work out the MLE but I want to use EM method.

    Reading online I get that this censor (or right censor) would give E(X|X≥T) = T + 1/∅ and I get it but dont really know how to show it. Im not sure how to write the complete data likelihood or log-likelihood for this EM (im more used to mixed distributions or id just solve MLE).

    I just dont really know how to set up the E step or M step. It should be quite trivial given what I know already but I just keep confusing myself with the whole

    Q(∅,∅i) = E[l(∅;x1,...,xn)|∅i;x1,...,xm).

    i have some intial data and then iterating using the M step should also be trivial, im just falling down at the one of the first hurdles.

    Thanks in advance.
  2. jcsd
  3. Dec 17, 2011 #2
    Hint: write X_{m+1} = T + Y_{m+1} etc, where the Y_i are iid to X_1.
  4. Dec 17, 2011 #3
    So I can just say E(X) = 1/theta (from 1 - m, as its distribution is exponential) and write X_m+1 - X_N as T + Y_m+1 - T + Y_n where Y_i are iid to X_i (or was X_1 right, I assumed it was a typo) and thus the expectation of the censored data is simply T + 1/theta.

    If I solve as MLE I would have l=mlog(theta) - mthetax - (n-m)thetaT,
    but in terms of EM how would I write down the data log likelihood (in this case I would treat all x1-xn as observed).
  5. Dec 17, 2011 #4
    no (also it's better to use "to" or ".." instead of "-" to indicate ranges)
    oops, actually the Y_i are iid to none of the X_i (since X_1, ..., X_m are restricted to the range [0,T), you'll need to include a normalizing factor in the distribution). The Y_i are exponential because of the memoryless property
    yes but you don't need this fact yet
    no the log-likelihood includes random variables because of the unobserved data - this is why the E step is done.
  6. Dec 18, 2011 #5
    Right well Ill just call my complete data Z = (x1,...,xm,T)
    where T=(xm+1,...,xn) are censored/unobserved.

    Then my complete data log likelihood will just be:

    l(x) = nlog(∅) - Ʃx∅ all sums go from n starting at i=1

    Then given the memoryless property we have E[X|X>=T] = T + 1/∅ (which im still unsure of how to show)

    I get my E step to be:

    Q(∅,∅i) = nlog(∅) - ∅(ƩT + (n-m)∅i)

    So my M step becomes:

    ∅i+1 = { ƩT + (n-m)∅i } / n
    Last edited: Dec 18, 2011
  7. Dec 18, 2011 #6
    ^^^ this is wrong, should still be sum of x, but should also involve T. initial guess:

    ∅i+1 = { Ʃx + T + (n-m)∅i } / n ???
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook