Statistics: Comparing values, greater, less. Anybody do this before?

  • Thread starter LiteHacker
  • Start date
  • Tags
    Statistics
In summary, the speaker has a large database of comparisons for different perfumes and is looking for a way to aggregate this information into a scalar value for each perfume. They mention the possibility of using a voting system, but are concerned about running into paradoxes. They also mention the idea of using tie-breaker rules to avoid these paradoxes. Finally, they express the need for a defined model to determine the scalar values and suggest looking into existing models used in sports for predicting outcomes.
  • #1
LiteHacker
18
0
This is difficult for me to describe.
If anyone can get the gist of what I am talking about and can point me to the correct keyword, would be really helpful.

I'll explain this through an example:
I have many perfumes.
I get surveys from people to see which perfumes they like more.
The way I do this is, for each person, I pick out two perfumes.
I let the person try out the perfumes, and let me know which perfume they like more.
Now I have a big database of comparisons of two perfumes.
I would like to aggregate this information somehow.

For example, if 80% like perfume A more than perfume B, and the 90% like perfume B than C.
Is there some way you can think of I can put all of this information together through some formula to come up with a "liking value" for each product?
Instead of it being relational between two items, make it a scalar value for each perfume.

This way for example, I can calculate the 'value' of each perfume by itself and sort by this number to find the best and worst perfumes.

Does this make sense to anyone?
Does anybody know how to get this scalar value from comparative statistics?

I'm sorry if this is a stupid question..
Let me know if you need a clarification of what I am trying to achieve.
 
Physics news on Phys.org
  • #2
Just offering people two choices to compare and then trying to extrapolate the results into a kind of "global" value for each may lead you straight to Condorcet's paradox:

http://en.wikipedia.org/wiki/Voting_paradox
 
  • #3
Thanks Michael,
That Voting Paradox has opened up a number of different "Voting" articles for me, which are pretty interesting.
But there are so many of them.. I don't know which one I need, if any.

Avoiding the question, which perfume is better or worse.
I just want to find out how I can, as you noted "extrapolate the results into a kind of 'global' value".
I understand I can run into Condorcet's paradox, or some other circular paradox on the way.

What system should I use if I have a large database of just two choices, if I want to get a scalar, nonrelational value for each perfume?

Assume I have the following:
For each comparison, if the person liked the first perfume or the second perfume. They can't say "both" or "neither".
I have many of these comparisons for each pair of perfumes.
 
  • #4
LiteHacker said:
This is difficult for me to describe.
If anyone can get the gist of what I am talking about and can point me to the correct keyword, would be really helpful.

I'll explain this through an example:
I have many perfumes.
I get surveys from people to see which perfumes they like more.
The way I do this is, for each person, I pick out two perfumes.
I let the person try out the perfumes, and let me know which perfume they like more.
Now I have a big database of comparisons of two perfumes.
I would like to aggregate this information somehow.

For example, if 80% like perfume A more than perfume B, and the 90% like perfume B than C.
Is there some way you can think of I can put all of this information together through some formula to come up with a "liking value" for each product?
Instead of it being relational between two items, make it a scalar value for each perfume.

This way for example, I can calculate the 'value' of each perfume by itself and sort by this number to find the best and worst perfumes.

Does this make sense to anyone?
Does anybody know how to get this scalar value from comparative statistics?

I'm sorry if this is a stupid question..
Let me know if you need a clarification of what I am trying to achieve.

You might look into "pool play" in the context of volleyball or other tournaments. Teams compete against each other, and in the end the teams are seeded 1-n based on how they did in the games. That's a very similar situation to what you are describing.

Except, to avoid the "voter paradox", there are tie-breaker rules. Things such as the scores that teams won by, and how they did against the team that they are tied with for a particular seed (how did they do head-to-head).

So you may want to modify your survey to include more information to help you break ties. Something like "rate each purfume on a scale of 1-10, and tell me which one you like best, even if you give them the same score"...

See the "tie breaker" rules at the end of this, for example: http://www.cabrillo.edu/~pkaplan/tournament_rules.html

.
 
  • #5
LiteHacker said:
Instead of it being relational between two items, make it a scalar value for each perfume.

This way for example, I can calculate the 'value' of each perfume by itself and sort by this number to find the best and worst perfumes.

Does this make sense to anyone?

It makes sense as an imprecise human desire, but you can't get a definite answer until you define what the scalar represents. The simplest way to reach such a definition is to invent a model where the scalar plays a role. For example, let the scalars associated with two perfumes A and B be [itex]S_A, S_B [/itex]. Invent a function that gives the probability that perfume A is preferred to perfume B. As a simple example, let the probability that A is preferred be [itex] \frac{S_A}{S_A + S_B} [/itex].

If you assume your data consists of independent trial and assume a definite model where the scalars play a role then you have defined a definite problem of statistical estimation. The above model is simplistic and it might not fit your data. An example of a model that is actually used to predict performance in 1-on-1 contests is the ELO system of rating chess players. I suspect there are models to predict the outcomes of matches in other sports. Perhaps some of the references you are given are based on such models. The important thing to understand is that there is no mathematical answer to your question until you define what the scalars represent.
 
  • #6
Stephen Tashi said:
It makes sense as an imprecise human desire, but you can't get a definite answer until you define what the scalar represents. The simplest way to reach such a definition is to invent a model where the scalar plays a role. For example, let the scalars associated with two perfumes A and B be [itex]S_A, S_B [/itex]. Invent a function that gives the probability that perfume A is preferred to perfume B. As a simple example, let the probability that A is preferred be [itex] \frac{S_A}{S_A + S_B} [/itex].

If you assume your data consists of independent trial and assume a definite model where the scalars play a role then you have defined a definite problem of statistical estimation. The above model is simplistic and it might not fit your data. An example of a model that is actually used to predict performance in 1-on-1 contests is the ELO system of rating chess players. I suspect there are models to predict the outcomes of matches in other sports. Perhaps some of the references you are given are based on such models. The important thing to understand is that there is no mathematical answer to your question until you define what the scalars represent.

Hey Stephen,

Interesting calculation.

My intention is to be able to build a graph, with perfumes in the x access, and 'likability' (this scalar value) in the y access.

I am not sure what formula to use.
I am confused however, with this calculation.

[itex] \frac{S_A}{S_A + S_B} [/itex]

If I have perfume A compared to perfume B, and perfume A compared to perfume C, how do I use the formula to come up with only one value for A?
 
  • #7
LiteHacker said:
how do I use the formula to come up with only one value for A?

First let me repeat that this formula may be too simplistic. But it does convey the general idea that the scalars must have some meaning in order for your question to have meaning.

Suppose we have a particular set of scalar values for the perfumes - these can be just guesses or randomly chosen values. Then, using the formula above, we have a "model" that gives the probability for the outcome of all pairwise comparisons of perfume. We need to define a way to measure how well this model fits the observed data.

To pick the measure of the "goodness" or "badness" of a fit of a model to data is usually a subjective matter. In a few cases, the model will be used to make decisions that have some definite financial consequences and the discrepancy between the data and model can be assigne a definite cost or reward. In most cases things aren't that clear cut; people pick some measure of fit that is easy to compute. For example let f(A,B) be the observed fraction of times that perfume A was preferred to perfume B. Let P(A,B) bet he probability that perfume A is preferred to perfume B according to the model. We could define the "badness" of fit of the model to data for a pair of perfumes to be [itex] | f(A,B) - P(A,B)| [/itex] or [itex] (f(A,B) - P(A,B))^2 [/itex]. We could define the total measure of "badness" to be the sum of all the pairwise measures of badness.

The problem of finding the best set of scalars then becomes an optimization problem. We want the set of scalars that minimized the badness of fit subject to certain constraints. (For example, it's simplest to constrain scalars to be positive numbers so that [itex] \frac{ S_A } {S_A + S_B} [/itex] always gives a number that can be interpreted as a probability. )

There are various way of minimizing a function of many variables where the variable are subject to constraints. They range from the more-or-less systematic methods such as "conjugate gradient" to the more-or-less trial and error methods, such as "simulated annealing".
 
  • #8
LiteHacker said:
I'll explain this through an example:
I have many perfumes.
I get surveys from people to see which perfumes they like more.
The way I do this is, for each person, I pick out two perfumes.
I let the person try out the perfumes, and let me know which perfume they like more.
Now I have a big database of comparisons of two perfumes.
I would like to aggregate this information somehow.

For example, if 80% like perfume A more than perfume B, and the 90% like perfume B than C.
Is there some way you can think of I can put all of this information together through some formula to come up with a "liking value" for each product?
Instead of it being relational between two items, make it a scalar value

The Google page rank algorithm does something similar. The perfumes are analogous to web pages with a link from B to A for each customer that prefers A to B. The ranking is based on probabilities that a randomly clicking surfer is on the page at a given time and the calculation involves finding the dominant eigenvector of a transition matrix. Possibly a small modification is required to handle repeated links correctly (test with the two perfume case to see if the proportions make sense).
 

1. What is the purpose of comparing values in statistics?

The purpose of comparing values in statistics is to understand the relationship between different data sets or variables. It allows us to determine if there are any significant differences or similarities between groups or individuals.

2. How do you determine which value is greater or less in statistics?

In statistics, we use measures of central tendency such as mean, median, and mode to determine which value is greater or less. We can also use visual aids such as bar graphs or box plots to compare values.

3. Can you compare values from different data sets?

Yes, it is possible to compare values from different data sets in statistics. However, it is important to ensure that the data sets are similar in nature and have a similar sample size for accurate comparisons.

4. What is the difference between statistical significance and practical significance?

Statistical significance refers to the likelihood that the observed differences between values are not due to chance. Practical significance, on the other hand, considers the real-world importance or impact of these differences.

5. Is there a specific statistical test for comparing values?

Yes, there are several statistical tests that can be used for comparing values, depending on the type of data and research question. Some common tests include t-tests, ANOVA, and chi-square tests.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
865
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
335
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
226
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
672
  • Set Theory, Logic, Probability, Statistics
Replies
20
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
Back
Top