Fitting a mixture model when component priors are known

TheOldHag · Jul 29, 2014

I have a list of scores between 0 and 1 generated by an information retrieval system - 1 being very relevant and 0 being completely non-relevant. I do not know whether the scores correspond to relevant or non-relevant items or not but I do know that the distribution of scores is generated by a mixture model consisting of an exponential distribution that generates the non-relevant scores and a normal distribution that generates the relevant scores. This then appears to be a perfect fit for the EM algorithm. ... and I have completed the EM algorithm and have obtained decent results. However, during the EM calculations I have to calculate the prior probability of each component - 1 exponential and 1 normal. Conveniently, I do know the proportion of relevant items in the collection that generated the score. So it stands to reason that I should just plug this value into the spots in the EM algorithm where a component prior is called for. But this leaves me scratching my head. Is this justified? Also, is there a further simplification that I can attend to with this extra bit of information - perhaps even use an entirely different method other than EM to solve this problem?

Also, to avoid confusion, these relevance score do have meaning via there relative magnitudes. It is always useful to traverse the items in order of relevance. However, what is not known is whether or not given a score the user will find the item relevant. It is that distribution this is attempting to find.

Stephen Tashi · Jul 30, 2014

I suggest you make a more caretul attempt to describe the problem. The EM algorithm maximizes something, but you didn't say what you were trying to maximize. I don't recognized the term "component prior" as standard terminology. What does it mean?

TheOldHag · Jul 30, 2014

It is generally found that the output scores of a reasonable information retrieval system will have a distribution that can be approximated by a mixture model consisting of one exponential distribution and one Gaussian distribution. The exponential distribution representing the scores for information items deemed not relevant by an interested user and the Gaussian distribution representing the scores of the relevant items.

So perhaps component is not standard terminology for the EM algorithm but in the context of determining the most likely parameters for a mixture model the individual components are often referred to as components and it is the prior probability of a given component that I already have. In the EM algorithm these show up as the priors that are used in the expectation step. From random sampling on test data I already have a good idea of the percentage of relevant items with respect to the query and so essentially I already have these priors and don't have to recalculate them during each stage in EM.

This does appear to be working currently but so far I have not found an intermediate situation like this in the literature. For the most part if you have enough data you just go ahead and calculate MLEs for the two components and do the math after that without any reliance on an iterative algorithm or else you don't have anything other than the scores and have to do the canonical EM. In my case, I have the percentage of relevant documents from training data but not enough relevant documents for MLE estimation of the Gaussian to be at all meaningful.

TheOldHag · Jul 30, 2014

Actually, part 3 of this paper does well to explain it. However, in this paper they do not have the percentage of relevant items on hand.

http://ciir.cs.umass.edu/pubfiles/ir-224.pdf

Stephen Tashi · Aug 2, 2014

Right now I don't have time to read a paper in order to understand your question.

My guess is that you are estimating a set of parameters for a distribution by using the EM algorithm to find values of estimators for the parameters that maximize some measure of fit between the distribution they define and some data. You find examples where this is done, but in your problem, some of the parameters estimated in the examples are known constants instead of values that need to be estimated. You are asking something about how to modify the method in the examples - perhaps you want to substitute the known values at certain steps of the process rather than use the algorithm that estimates them.

You could start by explaining the set of parameters that you are trying to estimate.

Fitting a mixture model when component priors are known

Thread 'Onto set mapping is the surjective set mapping, and into injective?'

Thread 'Roulette wheel physics and probability'

Thread 'Detail of Diagonalization Lemma'

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

A Does this computation satisfy LTL formulas?

A Prove that points which are indistinguishable from 0 exist (using logic)

A Mathematical Connection between Cosmic Expansion and Exponential Growth

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective