Fitting a mixture model when component priors are known

TheOldHag
Messages
44
Reaction score
3
I have a list of scores between 0 and 1 generated by an information retrieval system - 1 being very relevant and 0 being completely non-relevant. I do not know whether the scores correspond to relevant or non-relevant items or not but I do know that the distribution of scores is generated by a mixture model consisting of an exponential distribution that generates the non-relevant scores and a normal distribution that generates the relevant scores. This then appears to be a perfect fit for the EM algorithm. ... and I have completed the EM algorithm and have obtained decent results. However, during the EM calculations I have to calculate the prior probability of each component - 1 exponential and 1 normal. Conveniently, I do know the proportion of relevant items in the collection that generated the score. So it stands to reason that I should just plug this value into the spots in the EM algorithm where a component prior is called for. But this leaves me scratching my head. Is this justified? Also, is there a further simplification that I can attend to with this extra bit of information - perhaps even use an entirely different method other than EM to solve this problem?

Also, to avoid confusion, these relevance score do have meaning via there relative magnitudes. It is always useful to traverse the items in order of relevance. However, what is not known is whether or not given a score the user will find the item relevant. It is that distribution this is attempting to find.
 
Physics news on Phys.org
I suggest you make a more caretul attempt to describe the problem. The EM algorithm maximizes something, but you didn't say what you were trying to maximize. I don't recognized the term "component prior" as standard terminology. What does it mean?
 
It is generally found that the output scores of a reasonable information retrieval system will have a distribution that can be approximated by a mixture model consisting of one exponential distribution and one Gaussian distribution. The exponential distribution representing the scores for information items deemed not relevant by an interested user and the Gaussian distribution representing the scores of the relevant items.

So perhaps component is not standard terminology for the EM algorithm but in the context of determining the most likely parameters for a mixture model the individual components are often referred to as components and it is the prior probability of a given component that I already have. In the EM algorithm these show up as the priors that are used in the expectation step. From random sampling on test data I already have a good idea of the percentage of relevant items with respect to the query and so essentially I already have these priors and don't have to recalculate them during each stage in EM.

This does appear to be working currently but so far I have not found an intermediate situation like this in the literature. For the most part if you have enough data you just go ahead and calculate MLEs for the two components and do the math after that without any reliance on an iterative algorithm or else you don't have anything other than the scores and have to do the canonical EM. In my case, I have the percentage of relevant documents from training data but not enough relevant documents for MLE estimation of the Gaussian to be at all meaningful.
 
Right now I don't have time to read a paper in order to understand your question.

My guess is that you are estimating a set of parameters for a distribution by using the EM algorithm to find values of estimators for the parameters that maximize some measure of fit between the distribution they define and some data. You find examples where this is done, but in your problem, some of the parameters estimated in the examples are known constants instead of values that need to be estimated. You are asking something about how to modify the method in the examples - perhaps you want to substitute the known values at certain steps of the process rather than use the algorithm that estimates them.

You could start by explaining the set of parameters that you are trying to estimate.
 
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Thread 'Detail of Diagonalization Lemma'
The following is more or less taken from page 6 of C. Smorynski's "Self-Reference and Modal Logic". (Springer, 1985) (I couldn't get raised brackets to indicate codification (Gödel numbering), so I use a box. The overline is assigning a name. The detail I would like clarification on is in the second step in the last line, where we have an m-overlined, and we substitute the expression for m. Are we saying that the name of a coded term is the same as the coded term? Thanks in advance.
Back
Top