EM Method for censored data - Statistical Inference

Click For Summary

Discussion Overview

The discussion revolves around the application of the EM (Expectation-Maximization) method for estimating parameters in a statistical model involving censored data, specifically focusing on exponential distributions. Participants explore the setup of the likelihood functions and the steps required for the EM algorithm, including the E and M steps.

Discussion Character

  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant describes a scenario with a random sample of censored data and expresses uncertainty about setting up the complete data likelihood and log-likelihood for the EM method.
  • Another participant suggests a transformation of the censored data to facilitate the calculation of expectations, hinting at the use of iid random variables.
  • A participant calculates the expected value of the censored data as E(X|X≥T) = T + 1/θ, but questions how to derive this formally.
  • There is a discussion about the correct formulation of the complete data log likelihood, with one participant proposing a specific expression that includes the censored observations.
  • Another participant corrects a previous claim about the independence of certain variables and emphasizes the need for a normalizing factor due to the memoryless property of the exponential distribution.
  • One participant attempts to derive the E step and M step of the EM algorithm, but acknowledges confusion regarding the inclusion of the censored data in the calculations.
  • There is an acknowledgment of an error in a proposed formula for the M step, with a suggestion to revise it to include the sum of observed data and the censored threshold.

Areas of Agreement / Disagreement

Participants express various viewpoints on the setup and execution of the EM method for censored data, with no consensus reached on the correct formulation of the likelihood functions or the steps of the algorithm. Disagreements arise regarding the treatment of the censored data and the assumptions made about the distributions involved.

Contextual Notes

Participants highlight limitations in their understanding of the EM method as applied to censored data, particularly in defining the complete data likelihood and the implications of the memoryless property of the exponential distribution. There are unresolved mathematical steps and assumptions that affect the formulation of the E and M steps.

DKOli
Messages
8
Reaction score
0
For censored data.

Random sample X1,...,Xn

Censored such that x1,...xm are observed but xm+1,...,xn are not - we just know they exceed T.

fx = exponential = theata exp(-theta.x)



L = ∏ (from 1 going to m) f(x;theta) ∏ (m+1 - n) 1 - F(T;theta)

Using F = int f I get

L = ∏∅exp(-∅x) ∏ exp(-∅T)

I can now work out the MLE but I want to use EM method.

Reading online I get that this censor (or right censor) would give E(X|X≥T) = T + 1/∅ and I get it but don't really know how to show it. I am not sure how to write the complete data likelihood or log-likelihood for this EM (im more used to mixed distributions or id just solve MLE).

I just don't really know how to set up the E step or M step. It should be quite trivial given what I know already but I just keep confusing myself with the whole

Q(∅,∅i) = E[l(∅;x1,...,xn)|∅i;x1,...,xm).

i have some intial data and then iterating using the M step should also be trivial, I am just falling down at the one of the first hurdles.

Thanks in advance.
 
Physics news on Phys.org
DKOli said:
...Reading online I get that this censor (or right censor) would give E(X|X≥T) = T + 1/∅ and I get it but don't really know how to show it. I am not sure how to write the complete data likelihood or log-likelihood for this EM (im more used to mixed distributions or id just solve MLE)...

Hint: write X_{m+1} = T + Y_{m+1} etc, where the Y_i are iid to X_1.
 
So I can just say E(X) = 1/theta (from 1 - m, as its distribution is exponential) and write X_m+1 - X_N as T + Y_m+1 - T + Y_n where Y_i are iid to X_i (or was X_1 right, I assumed it was a typo) and thus the expectation of the censored data is simply T + 1/theta.

If I solve as MLE I would have l=mlog(theta) - mthetax - (n-m)thetaT,
but in terms of EM how would I write down the data log likelihood (in this case I would treat all x1-xn as observed).
 
DKOli said:
So I can just say E(X) = 1/theta (from 1 - m, as its distribution is exponential)
no (also it's better to use "to" or ".." instead of "-" to indicate ranges)
and write X_m+1 - X_N as T + Y_m+1 - T + Y_n where Y_i are iid to X_i (or was X_1 right, I assumed it was a typo)
oops, actually the Y_i are iid to none of the X_i (since X_1, ..., X_m are restricted to the range [0,T), you'll need to include a normalizing factor in the distribution). The Y_i are exponential because of the memoryless property
and thus the expectation of the censored data is simply T + 1/theta.
yes but you don't need this fact yet
If I solve as MLE I would have l=mlog(theta) - mthetax - (n-m)thetaT,
but in terms of EM how would I write down the data log likelihood (in this case I would treat all x1-xn as observed).
no the log-likelihood includes random variables because of the unobserved data - this is why the E step is done.
 
Right well Ill just call my complete data Z = (x1,...,xm,T)
where T=(xm+1,...,xn) are censored/unobserved.

Then my complete data log likelihood will just be:

l(x) = nlog(∅) - Ʃx∅ all sums go from n starting at i=1

Then given the memoryless property we have E[X|X>=T] = T + 1/∅ (which I am still unsure of how to show)

I get my E step to be:

Q(∅,∅i) = nlog(∅) - ∅(ƩT + (n-m)∅i)So my M step becomes:

∅i+1 = { ƩT + (n-m)∅i } / n
 
Last edited:
^^^ this is wrong, should still be sum of x, but should also involve T. initial guess:

∅i+1 = { Ʃx + T + (n-m)∅i } / n ?
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
Replies
3
Views
3K
Replies
1
Views
4K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 16 ·
Replies
16
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K