- #1
TheOldHag
- 44
- 3
I have a list of scores between 0 and 1 generated by an information retrieval system - 1 being very relevant and 0 being completely non-relevant. I do not know whether the scores correspond to relevant or non-relevant items or not but I do know that the distribution of scores is generated by a mixture model consisting of an exponential distribution that generates the non-relevant scores and a normal distribution that generates the relevant scores. This then appears to be a perfect fit for the EM algorithm. ... and I have completed the EM algorithm and have obtained decent results. However, during the EM calculations I have to calculate the prior probability of each component - 1 exponential and 1 normal. Conveniently, I do know the proportion of relevant items in the collection that generated the score. So it stands to reason that I should just plug this value into the spots in the EM algorithm where a component prior is called for. But this leaves me scratching my head. Is this justified? Also, is there a further simplification that I can attend to with this extra bit of information - perhaps even use an entirely different method other than EM to solve this problem?
Also, to avoid confusion, these relevance score do have meaning via there relative magnitudes. It is always useful to traverse the items in order of relevance. However, what is not known is whether or not given a score the user will find the item relevant. It is that distribution this is attempting to find.
Also, to avoid confusion, these relevance score do have meaning via there relative magnitudes. It is always useful to traverse the items in order of relevance. However, what is not known is whether or not given a score the user will find the item relevant. It is that distribution this is attempting to find.