Fitting a mixture model when component priors are known

Click For Summary

Discussion Overview

The discussion revolves around the application of the EM algorithm for fitting a mixture model to scores generated by an information retrieval system. Participants explore the implications of having known component priors, specifically an exponential distribution for non-relevant items and a normal distribution for relevant items, and whether this knowledge can simplify the EM process or suggest alternative methods.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant describes using the EM algorithm to fit a mixture model to scores, noting that they have prior knowledge of the proportion of relevant items, which they consider using in the algorithm.
  • Another participant questions the clarity of the problem description and the use of the term "component prior," suggesting that more detail is needed regarding what is being maximized in the EM algorithm.
  • A different participant explains that the mixture model typically consists of an exponential distribution for non-relevant items and a Gaussian distribution for relevant items, clarifying the terminology around "components" in the context of the EM algorithm.
  • One participant mentions that while their approach seems to be working, they have not encountered similar cases in the literature where known priors are utilized in this manner.
  • Another participant expresses the need for more clarity on the parameters being estimated and suggests that the original poster might be looking to modify the EM algorithm to incorporate known values instead of estimating them.

Areas of Agreement / Disagreement

Participants express varying levels of understanding and clarity regarding the problem, with some seeking more information while others provide insights based on their interpretations. No consensus is reached on the appropriateness of using known priors in the EM algorithm.

Contextual Notes

Some limitations include the lack of clarity on the specific parameters being estimated and the potential impact of known priors on the EM algorithm's iterative process. The discussion also highlights the absence of literature addressing similar scenarios.

TheOldHag
Messages
44
Reaction score
3
I have a list of scores between 0 and 1 generated by an information retrieval system - 1 being very relevant and 0 being completely non-relevant. I do not know whether the scores correspond to relevant or non-relevant items or not but I do know that the distribution of scores is generated by a mixture model consisting of an exponential distribution that generates the non-relevant scores and a normal distribution that generates the relevant scores. This then appears to be a perfect fit for the EM algorithm. ... and I have completed the EM algorithm and have obtained decent results. However, during the EM calculations I have to calculate the prior probability of each component - 1 exponential and 1 normal. Conveniently, I do know the proportion of relevant items in the collection that generated the score. So it stands to reason that I should just plug this value into the spots in the EM algorithm where a component prior is called for. But this leaves me scratching my head. Is this justified? Also, is there a further simplification that I can attend to with this extra bit of information - perhaps even use an entirely different method other than EM to solve this problem?

Also, to avoid confusion, these relevance score do have meaning via there relative magnitudes. It is always useful to traverse the items in order of relevance. However, what is not known is whether or not given a score the user will find the item relevant. It is that distribution this is attempting to find.
 
Physics news on Phys.org
I suggest you make a more caretul attempt to describe the problem. The EM algorithm maximizes something, but you didn't say what you were trying to maximize. I don't recognized the term "component prior" as standard terminology. What does it mean?
 
It is generally found that the output scores of a reasonable information retrieval system will have a distribution that can be approximated by a mixture model consisting of one exponential distribution and one Gaussian distribution. The exponential distribution representing the scores for information items deemed not relevant by an interested user and the Gaussian distribution representing the scores of the relevant items.

So perhaps component is not standard terminology for the EM algorithm but in the context of determining the most likely parameters for a mixture model the individual components are often referred to as components and it is the prior probability of a given component that I already have. In the EM algorithm these show up as the priors that are used in the expectation step. From random sampling on test data I already have a good idea of the percentage of relevant items with respect to the query and so essentially I already have these priors and don't have to recalculate them during each stage in EM.

This does appear to be working currently but so far I have not found an intermediate situation like this in the literature. For the most part if you have enough data you just go ahead and calculate MLEs for the two components and do the math after that without any reliance on an iterative algorithm or else you don't have anything other than the scores and have to do the canonical EM. In my case, I have the percentage of relevant documents from training data but not enough relevant documents for MLE estimation of the Gaussian to be at all meaningful.
 
Right now I don't have time to read a paper in order to understand your question.

My guess is that you are estimating a set of parameters for a distribution by using the EM algorithm to find values of estimators for the parameters that maximize some measure of fit between the distribution they define and some data. You find examples where this is done, but in your problem, some of the parameters estimated in the examples are known constants instead of values that need to be estimated. You are asking something about how to modify the method in the examples - perhaps you want to substitute the known values at certain steps of the process rather than use the algorithm that estimates them.

You could start by explaining the set of parameters that you are trying to estimate.
 

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
Replies
2
Views
5K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 5 ·
Replies
5
Views
4K
Replies
3
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 12 ·
Replies
12
Views
5K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K