Statistics: Comparing values, greater, less. Anybody do this before?by LiteHacker Tags: comparing, greater, statistics, values 

#1
Dec2712, 01:44 PM

P: 18

This is difficult for me to describe.
If anyone can get the gist of what I am talking about and can point me to the correct keyword, would be really helpful. I'll explain this through an example: I have many perfumes. I get surveys from people to see which perfumes they like more. The way I do this is, for each person, I pick out two perfumes. I let the person try out the perfumes, and let me know which perfume they like more. Now I have a big database of comparisons of two perfumes. I would like to aggregate this information somehow. For example, if 80% like perfume A more than perfume B, and the 90% like perfume B than C. Is there some way you can think of I can put all of this information together through some formula to come up with a "liking value" for each product? Instead of it being relational between two items, make it a scalar value for each perfume. This way for example, I can calculate the 'value' of each perfume by itself and sort by this number to find the best and worst perfumes. Does this make sense to anyone? Does anybody know how to get this scalar value from comparative statistics? I'm sorry if this is a stupid question.. Let me know if you need a clarification of what I am trying to achieve. 



#2
Dec2712, 01:53 PM

P: 181

Just offering people two choices to compare and then trying to extrapolate the results into a kind of "global" value for each may lead you straight to Condorcet's paradox:
http://en.wikipedia.org/wiki/Voting_paradox 



#3
Dec2712, 02:45 PM

P: 18

Thanks Michael,
That Voting Paradox has opened up a number of different "Voting" articles for me, which are pretty interesting. But there are so many of them.. I don't know which one I need, if any. Avoiding the question, which perfume is better or worse. I just want to find out how I can, as you noted "extrapolate the results into a kind of 'global' value". I understand I can run into Condorcet's paradox, or some other circular paradox on the way. What system should I use if I have a large database of just two choices, if I want to get a scalar, nonrelational value for each perfume? Assume I have the following: For each comparison, if the person liked the first perfume or the second perfume. They can't say "both" or "neither". I have many of these comparisons for each pair of perfumes. 



#4
Dec2712, 02:46 PM

Mentor
P: 39,627

Statistics: Comparing values, greater, less. Anybody do this before?Except, to avoid the "voter paradox", there are tiebreaker rules. Things such as the scores that teams won by, and how they did against the team that they are tied with for a particular seed (how did they do headtohead). So you may want to modify your survey to include more information to help you break ties. Something like "rate each purfume on a scale of 110, and tell me which one you like best, even if you give them the same score".... See the "tie breaker" rules at the end of this, for example: http://www.cabrillo.edu/~pkaplan/tournament_rules.html . 



#5
Dec2812, 12:39 AM

Sci Advisor
P: 3,173

If you assume your data consists of independent trial and assume a definite model where the scalars play a role then you have defined a definite problem of statistical estimation. The above model is simplistic and it might not fit your data. An example of a model that is actually used to predict performance in 1on1 contests is the ELO system of rating chess players. I suspect there are models to predict the outcomes of matches in other sports. Perhaps some of the references you are given are based on such models. The important thing to understand is that there is no mathematical answer to your question until you define what the scalars represent. 



#6
Dec2812, 06:36 AM

P: 18

Interesting calculation. My intention is to be able to build a graph, with perfumes in the x access, and 'likability' (this scalar value) in the y access. I am not sure what formula to use. I am confused however, with this calculation. [itex] \frac{S_A}{S_A + S_B} [/itex] If I have perfume A compared to perfume B, and perfume A compared to perfume C, how do I use the formula to come up with only one value for A? 



#7
Dec2912, 10:08 PM

Sci Advisor
P: 3,173

Suppose we have a particular set of scalar values for the perfumes  these can be just guesses or randomly chosen values. Then, using the formula above, we have a "model" that gives the probability for the outcome of all pairwise comparisons of perfume. We need to define a way to measure how well this model fits the observed data. To pick the measure of the "goodness" or "badness" of a fit of a model to data is usually a subjective matter. In a few cases, the model will be used to make decisions that have some definite financial consequences and the discrepancy between the data and model can be assigne a definite cost or reward. In most cases things aren't that clear cut; people pick some measure of fit that is easy to compute. For example let f(A,B) be the observed fraction of times that perfume A was preferred to perfume B. Let P(A,B) bet he probability that perfume A is preferred to perfume B according to the model. We could define the "badness" of fit of the model to data for a pair of perfumes to be [itex]  f(A,B)  P(A,B) [/itex] or [itex] (f(A,B)  P(A,B))^2 [/itex]. We could define the total measure of "badness" to be the sum of all the pairwise measures of badness. The problem of finding the best set of scalars then becomes an optimization problem. We want the set of scalars that minimized the badness of fit subject to certain constraints. (For example, it's simplest to constrain scalars to be positive numbers so that [itex] \frac{ S_A } {S_A + S_B} [/itex] always gives a number that can be interpreted as a probability. ) There are various way of minimizing a function of many variables where the variable are subject to constraints. They range from the moreorless systematic methods such as "conjugate gradient" to the moreorless trial and error methods, such as "simulated annealing". 



#8
Jan213, 01:45 AM

P: 523




Register to reply 
Related Discussions  
Statistics: Comparing z scores across distributions  Precalculus Mathematics Homework  0  
Comparing different sized samples, statistics help needed.  Set Theory, Logic, Probability, Statistics  1  
Comparing real and expected values  Set Theory, Logic, Probability, Statistics  2  
Comparing two datasets: methods and statistics  Set Theory, Logic, Probability, Statistics  3  
MATLAB Help  Comparing values/Indexing  Engineering, Comp Sci, & Technology Homework  2 