Can a Beta Distribution Model Scores in the Interval [0,1] for Ranked Retrieval?

TheOldHag
Messages
44
Reaction score
3
I have a set of scored items with the scores in the interval [0,1]. Roughly speaking the distribution of scores is about 50% equal to 0 and then sloping steeply downward all the way toward one or near to one. I want to fit this data to a distribution and use that down the road in some calculation but I'm not sure how to proceed.

My guess is that since the data lay in the interval [0,1] it can be modeled as a beta distribution. So now I need to find the parameters alpha and beta. Is it easy as calculating the sample mean and the sample variance and working backwards from the equations for the mean and variance of a beta distribution or does that only work for normal distributions? Since these are sample do they approximate a normal distribution so that I should be fitting a normal distribution to the data despite the interval [0,1] (it would have very thin tails)? Comments appreciated.
 
Physics news on Phys.org
I'm sorry you are not generating any responses at the moment. Is there any additional information you can share with us? Any new findings?
 
I think this contains what I'm looking for but have not dug in yet since this problem has been set aside temporarily.

http://dare.uva.nl/document/125861

The general issue I'm having here surrounds ranked retrieval. I have rankings and they do work and I can present them in descending order to the user so that they can see more relevant items first. But it is useful for a variety of other applications to know what is the probability of relevance given a score (or non-relevance). This paper here seems to construct two distributions for the score given relevant and given non-relevant and then goes from there. Another thing I can do with this is project a possible curve of precision and recall as the user proceeds through the items in ranked order.
 
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Thread 'Detail of Diagonalization Lemma'
The following is more or less taken from page 6 of C. Smorynski's "Self-Reference and Modal Logic". (Springer, 1985) (I couldn't get raised brackets to indicate codification (Gödel numbering), so I use a box. The overline is assigning a name. The detail I would like clarification on is in the second step in the last line, where we have an m-overlined, and we substitute the expression for m. Are we saying that the name of a coded term is the same as the coded term? Thanks in advance.
Back
Top