Can a Beta Distribution Model Scores in the Interval [0,1] for Ranked Retrieval?

Click For Summary
SUMMARY

The discussion centers on modeling a set of scored items within the interval [0,1] using a beta distribution due to the unique characteristics of the score distribution, which features approximately 50% of scores equal to 0 and a steep decline towards 1. The user seeks to determine the parameters alpha and beta for the beta distribution, questioning whether these can be derived from the sample mean and variance. Additionally, the conversation touches on the relevance of ranked retrieval and the construction of distributions for relevance probabilities based on scores, as well as the potential for projecting precision and recall curves.

PREREQUISITES
  • Understanding of beta distribution parameters (alpha and beta)
  • Knowledge of sample mean and variance calculations
  • Familiarity with ranked retrieval concepts
  • Basic statistics related to probability distributions
NEXT STEPS
  • Research methods for estimating beta distribution parameters from sample data
  • Explore the relationship between sample distributions and normal distributions
  • Investigate techniques for constructing precision and recall curves in ranked retrieval
  • Study the implications of relevance probability distributions in information retrieval systems
USEFUL FOR

Data scientists, statisticians, and machine learning practitioners focused on ranked retrieval systems and probability modeling in score-based applications.

TheOldHag
Messages
44
Reaction score
3
I have a set of scored items with the scores in the interval [0,1]. Roughly speaking the distribution of scores is about 50% equal to 0 and then sloping steeply downward all the way toward one or near to one. I want to fit this data to a distribution and use that down the road in some calculation but I'm not sure how to proceed.

My guess is that since the data lay in the interval [0,1] it can be modeled as a beta distribution. So now I need to find the parameters alpha and beta. Is it easy as calculating the sample mean and the sample variance and working backwards from the equations for the mean and variance of a beta distribution or does that only work for normal distributions? Since these are sample do they approximate a normal distribution so that I should be fitting a normal distribution to the data despite the interval [0,1] (it would have very thin tails)? Comments appreciated.
 
Physics news on Phys.org
I'm sorry you are not generating any responses at the moment. Is there any additional information you can share with us? Any new findings?
 
I think this contains what I'm looking for but have not dug in yet since this problem has been set aside temporarily.

http://dare.uva.nl/document/125861

The general issue I'm having here surrounds ranked retrieval. I have rankings and they do work and I can present them in descending order to the user so that they can see more relevant items first. But it is useful for a variety of other applications to know what is the probability of relevance given a score (or non-relevance). This paper here seems to construct two distributions for the score given relevant and given non-relevant and then goes from there. Another thing I can do with this is project a possible curve of precision and recall as the user proceeds through the items in ranked order.
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
Replies
2
Views
3K
Replies
4
Views
3K
  • · Replies 10 ·
Replies
10
Views
5K