Statistics: Comparing values, greater, less. Anybody do this before?

  • Context: Undergrad 
  • Thread starter Thread starter LiteHacker
  • Start date Start date
  • Tags Tags
    Statistics
Click For Summary

Discussion Overview

The discussion revolves around the challenge of aggregating comparative survey data on perfumes to derive a scalar "liking value" for each perfume. Participants explore statistical methods and models that could facilitate this aggregation, considering both theoretical and practical implications.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant describes a method of comparing perfumes based on survey results, seeking a way to convert relational preferences into scalar values for each perfume.
  • Another participant introduces the concept of Condorcet's paradox, suggesting that simply comparing two options may lead to circular reasoning in determining overall preferences.
  • A suggestion is made to consider tie-breaking rules similar to those used in tournament seeding, which could help refine the survey methodology.
  • One participant proposes a model where the scalar values are defined in relation to the probability of preference, using a formula that relates the scalars of two perfumes.
  • Concerns are raised about the simplicity of the proposed model and the need for a clear definition of what the scalar represents to make the question meaningful.
  • There is a discussion about how to derive a single scalar value for a perfume when it has been compared to multiple others, highlighting the complexity of the aggregation process.

Areas of Agreement / Disagreement

Participants express differing views on the feasibility and methodology of deriving scalar values from comparative data. There is no consensus on a specific approach or model, and the discussion remains unresolved regarding the best way to aggregate the data.

Contextual Notes

Participants note the potential limitations of the proposed models, including assumptions about independence of trials and the need for a well-defined meaning of the scalar values. The discussion highlights the complexity of statistical estimation in this context.

LiteHacker
Messages
18
Reaction score
0
This is difficult for me to describe.
If anyone can get the gist of what I am talking about and can point me to the correct keyword, would be really helpful.

I'll explain this through an example:
I have many perfumes.
I get surveys from people to see which perfumes they like more.
The way I do this is, for each person, I pick out two perfumes.
I let the person try out the perfumes, and let me know which perfume they like more.
Now I have a big database of comparisons of two perfumes.
I would like to aggregate this information somehow.

For example, if 80% like perfume A more than perfume B, and the 90% like perfume B than C.
Is there some way you can think of I can put all of this information together through some formula to come up with a "liking value" for each product?
Instead of it being relational between two items, make it a scalar value for each perfume.

This way for example, I can calculate the 'value' of each perfume by itself and sort by this number to find the best and worst perfumes.

Does this make sense to anyone?
Does anybody know how to get this scalar value from comparative statistics?

I'm sorry if this is a stupid question..
Let me know if you need a clarification of what I am trying to achieve.
 
Physics news on Phys.org
Just offering people two choices to compare and then trying to extrapolate the results into a kind of "global" value for each may lead you straight to Condorcet's paradox:

http://en.wikipedia.org/wiki/Voting_paradox
 
Thanks Michael,
That Voting Paradox has opened up a number of different "Voting" articles for me, which are pretty interesting.
But there are so many of them.. I don't know which one I need, if any.

Avoiding the question, which perfume is better or worse.
I just want to find out how I can, as you noted "extrapolate the results into a kind of 'global' value".
I understand I can run into Condorcet's paradox, or some other circular paradox on the way.

What system should I use if I have a large database of just two choices, if I want to get a scalar, nonrelational value for each perfume?

Assume I have the following:
For each comparison, if the person liked the first perfume or the second perfume. They can't say "both" or "neither".
I have many of these comparisons for each pair of perfumes.
 
LiteHacker said:
This is difficult for me to describe.
If anyone can get the gist of what I am talking about and can point me to the correct keyword, would be really helpful.

I'll explain this through an example:
I have many perfumes.
I get surveys from people to see which perfumes they like more.
The way I do this is, for each person, I pick out two perfumes.
I let the person try out the perfumes, and let me know which perfume they like more.
Now I have a big database of comparisons of two perfumes.
I would like to aggregate this information somehow.

For example, if 80% like perfume A more than perfume B, and the 90% like perfume B than C.
Is there some way you can think of I can put all of this information together through some formula to come up with a "liking value" for each product?
Instead of it being relational between two items, make it a scalar value for each perfume.

This way for example, I can calculate the 'value' of each perfume by itself and sort by this number to find the best and worst perfumes.

Does this make sense to anyone?
Does anybody know how to get this scalar value from comparative statistics?

I'm sorry if this is a stupid question..
Let me know if you need a clarification of what I am trying to achieve.

You might look into "pool play" in the context of volleyball or other tournaments. Teams compete against each other, and in the end the teams are seeded 1-n based on how they did in the games. That's a very similar situation to what you are describing.

Except, to avoid the "voter paradox", there are tie-breaker rules. Things such as the scores that teams won by, and how they did against the team that they are tied with for a particular seed (how did they do head-to-head).

So you may want to modify your survey to include more information to help you break ties. Something like "rate each purfume on a scale of 1-10, and tell me which one you like best, even if you give them the same score"...

See the "tie breaker" rules at the end of this, for example: http://www.cabrillo.edu/~pkaplan/tournament_rules.html

.
 
LiteHacker said:
Instead of it being relational between two items, make it a scalar value for each perfume.

This way for example, I can calculate the 'value' of each perfume by itself and sort by this number to find the best and worst perfumes.

Does this make sense to anyone?

It makes sense as an imprecise human desire, but you can't get a definite answer until you define what the scalar represents. The simplest way to reach such a definition is to invent a model where the scalar plays a role. For example, let the scalars associated with two perfumes A and B be [itex]S_A, S_B[/itex]. Invent a function that gives the probability that perfume A is preferred to perfume B. As a simple example, let the probability that A is preferred be [itex]\frac{S_A}{S_A + S_B}[/itex].

If you assume your data consists of independent trial and assume a definite model where the scalars play a role then you have defined a definite problem of statistical estimation. The above model is simplistic and it might not fit your data. An example of a model that is actually used to predict performance in 1-on-1 contests is the ELO system of rating chess players. I suspect there are models to predict the outcomes of matches in other sports. Perhaps some of the references you are given are based on such models. The important thing to understand is that there is no mathematical answer to your question until you define what the scalars represent.
 
Stephen Tashi said:
It makes sense as an imprecise human desire, but you can't get a definite answer until you define what the scalar represents. The simplest way to reach such a definition is to invent a model where the scalar plays a role. For example, let the scalars associated with two perfumes A and B be [itex]S_A, S_B[/itex]. Invent a function that gives the probability that perfume A is preferred to perfume B. As a simple example, let the probability that A is preferred be [itex]\frac{S_A}{S_A + S_B}[/itex].

If you assume your data consists of independent trial and assume a definite model where the scalars play a role then you have defined a definite problem of statistical estimation. The above model is simplistic and it might not fit your data. An example of a model that is actually used to predict performance in 1-on-1 contests is the ELO system of rating chess players. I suspect there are models to predict the outcomes of matches in other sports. Perhaps some of the references you are given are based on such models. The important thing to understand is that there is no mathematical answer to your question until you define what the scalars represent.

Hey Stephen,

Interesting calculation.

My intention is to be able to build a graph, with perfumes in the x access, and 'likability' (this scalar value) in the y access.

I am not sure what formula to use.
I am confused however, with this calculation.

[itex]\frac{S_A}{S_A + S_B}[/itex]

If I have perfume A compared to perfume B, and perfume A compared to perfume C, how do I use the formula to come up with only one value for A?
 
LiteHacker said:
how do I use the formula to come up with only one value for A?

First let me repeat that this formula may be too simplistic. But it does convey the general idea that the scalars must have some meaning in order for your question to have meaning.

Suppose we have a particular set of scalar values for the perfumes - these can be just guesses or randomly chosen values. Then, using the formula above, we have a "model" that gives the probability for the outcome of all pairwise comparisons of perfume. We need to define a way to measure how well this model fits the observed data.

To pick the measure of the "goodness" or "badness" of a fit of a model to data is usually a subjective matter. In a few cases, the model will be used to make decisions that have some definite financial consequences and the discrepancy between the data and model can be assigne a definite cost or reward. In most cases things aren't that clear cut; people pick some measure of fit that is easy to compute. For example let f(A,B) be the observed fraction of times that perfume A was preferred to perfume B. Let P(A,B) bet he probability that perfume A is preferred to perfume B according to the model. We could define the "badness" of fit of the model to data for a pair of perfumes to be [itex]| f(A,B) - P(A,B)|[/itex] or [itex](f(A,B) - P(A,B))^2[/itex]. We could define the total measure of "badness" to be the sum of all the pairwise measures of badness.

The problem of finding the best set of scalars then becomes an optimization problem. We want the set of scalars that minimized the badness of fit subject to certain constraints. (For example, it's simplest to constrain scalars to be positive numbers so that [itex]\frac{ S_A } {S_A + S_B}[/itex] always gives a number that can be interpreted as a probability. )

There are various way of minimizing a function of many variables where the variable are subject to constraints. They range from the more-or-less systematic methods such as "conjugate gradient" to the more-or-less trial and error methods, such as "simulated annealing".
 
LiteHacker said:
I'll explain this through an example:
I have many perfumes.
I get surveys from people to see which perfumes they like more.
The way I do this is, for each person, I pick out two perfumes.
I let the person try out the perfumes, and let me know which perfume they like more.
Now I have a big database of comparisons of two perfumes.
I would like to aggregate this information somehow.

For example, if 80% like perfume A more than perfume B, and the 90% like perfume B than C.
Is there some way you can think of I can put all of this information together through some formula to come up with a "liking value" for each product?
Instead of it being relational between two items, make it a scalar value

The Google page rank algorithm does something similar. The perfumes are analogous to web pages with a link from B to A for each customer that prefers A to B. The ranking is based on probabilities that a randomly clicking surfer is on the page at a given time and the calculation involves finding the dominant eigenvector of a transition matrix. Possibly a small modification is required to handle repeated links correctly (test with the two perfume case to see if the proportions make sense).
 

Similar threads

Replies
14
Views
1K
  • · Replies 6 ·
Replies
6
Views
1K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 5 ·
Replies
5
Views
6K
  • · Replies 16 ·
Replies
16
Views
2K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 14 ·
Replies
14
Views
1K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K