Reasonable method for evaluation of bets

  • Context: Graduate 
  • Thread starter Thread starter tom.stoer
  • Start date Start date
  • Tags Tags
    Method
Click For Summary

Discussion Overview

The discussion revolves around evaluating bets on the outcome of the next general election in Germany, focusing on the selection of reasonable weight and deviation functions for predicting the percentage of votes for six major political parties. Participants explore various mathematical approaches and models related to probability distributions, particularly the multinomial distribution and its approximations.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant proposes calculating a deviation measure \(D_n\) based on a weight function \(w\) and a deviation function \(d\), suggesting \(d(x) = x^2\) and \(w(x) = 1\) as a starting point.
  • Another participant suggests ranking predictions using the multinomial distribution and emphasizes the need for approximations due to the large number of votes.
  • A different viewpoint highlights the challenge of predicting larger parties with accuracy, proposing a penalty based on the amount of information needed for correct predictions.
  • One participant recommends using a multivariate Gaussian approximation for the multinomial distribution, noting the importance of adding an "other" party to avoid singularity issues.
  • There is a discussion about the relative and absolute accuracy of predictions for larger versus smaller parties, with references to the variance in the multinomial distribution.
  • Some participants express a desire for practical implementation, suggesting the use of Excel or the Dirichlet distribution for computational purposes.
  • Clarifications are sought regarding the notation used for the weight and deviation functions, indicating some confusion in the mathematical representation.

Areas of Agreement / Disagreement

Participants express multiple competing views on the best approach for defining weight and deviation functions, as well as the appropriateness of different statistical models. The discussion remains unresolved with no consensus on the optimal methods.

Contextual Notes

Participants note limitations related to the assumptions of their models, the need for approximations in calculations, and the implications of including an "other" party in the distribution. The discussion reflects a variety of perspectives on the complexity of predicting election outcomes.

tom.stoer
Science Advisor
Messages
5,774
Reaction score
174
A couple of friends ##n = 1 \ldots N## of mine bet on the result of the next general election in Germany. We select the six most important political parties ##p = 1 \ldots 6##. For each party ##p## and each friend ##n## we have the forecast ##x_{pn}## and the official election result ##x_p##.

Now we calculate

$$D_n = \sum_p w(x_p) \, d(x_{pn} - x_p)$$

with a weight-function ##w## and a deviation-function ##d##. The winner is the guy with smallest ##D_p##.

My question is, what are the most reasonable functions?

It seems natural to set

$$d(x) = x^2$$
$$w(x) = 1$$

which corresponds to the Euclidean distance with equal weight for each party.

But of course other choices are conceivable, e.g.

$$w(x) = x^c$$

weighting bigger parties more than smaller parties.

Are there any reasonable arguments and choices for the weight-function ##w## and the deviation-function ##d##?
 
Last edited:
Physics news on Phys.org
What you are predicting is essentially the expected percentage of votes. I would rank contestants according to the multinomial distribution. The prediction that gave the largest probability for the end result wins.

Edit: Since the number of votes will be large, you will probably want to use a lot of approximations in the computations of the factorials ...
 
  • Like
Likes   Reactions: mfb
Predicting a larger party with 1% accuracy is harder than predicting a smaller party with that accuracy (as extreme case, imagine you would have included all parties, even those that don't manage to get 0.1%).

You could interpret the party results as probabilities (that a randomly picked person will vote for them) and then give a penalty based on how much additional information would have been needed to get the prediction right (in the sense of Bayesian updates). As an example, predicting 2% instead of 1% would be roughly as bad as predicting 60% instead of 30%. Hmm... not sure if that is a good approach.

I would suggest giving a slightly smaller weight to larger parties. 1/sqrt(x_p) or something similar.
 
Actually, since the number of votes is large, you may want to approximate the multinomial distribution with a multivariate Gaussian with the appropriate covariance matrix and expectation values. Just beware of the fact that the covariance is singular (you should add a party called ”other” so that the sum of all votes goes to 100%, the singularity stems from the outcomes being constrained to the surface of 100% votes).
 
mfb said:
Predicting a larger party with 1% accuracy is harder than predicting a smaller party with that accuracy (as extreme case, imagine you would have included all parties, even those that don't manage to get 0.1%).
You mean predicting a larger party within 1 pp accuracy? Parties around 50% of the votes will be easier to predict with relative accuracy although more difficult in absolute terms. Consider the special case of only two parties, one very big and the other very small. In this case they have the same absolute error, but the smaller party will have a much larger relative spread.

In the multinomial distribution, the variance in outcome i is ##V_i = np_i(1-p_i)## (the same as in the binomial). The relative accuracy should therefore be ##\sqrt{V_i}/(np_i) = (1-p_i)/\sqrt{np_i}##. This clearly decreases as ##p_i## grows. However, the absolute accuracy ##\sqrt{V_i}## would take its maximal value at ##p_i = 0.5##.
 
I am sorry, but I don't get it.

Can you do me a favor and translate your notation into mine for the functions ##w## and ##d##?
 
No, it does not have that form. You may be able to put it on a relatively similar form if you use the multivariate gaussian approximation and use the log of the likelihood instead of the likelihood itself.

The point is that the multinomial distribution is what you would have if you had ##n## votes where each vote has a probability ##p_i## of being for party ##i##. Of course, ##n## is a very large number in the case of the German elections.
 
seems to be not very pragmatic; my idea was to put in in an excel sheet, not to write a paper
 
I don’t see any problem with putting it into an Excel sheet. For practical purposes, you may want to use the Dirichlet distribution instead with the appropriate scaling of parameters (and using the log likelihood to avoid numerical issues with extremely small numbers).
 
  • #10
Orodruin said:
(you should add a party called ”other” so that the sum of all votes goes to 100%, the singularity stems from the outcomes being constrained to the surface of 100% votes).
That "other" party exists anyway, as the 6 big parties won't get all votes. They get all seats (as you need 5% of the votes to get seats) but that is a different point.
Orodruin said:
You mean predicting a larger party within 1 pp accuracy? Parties around 50% of the votes will be easier to predict with relative accuracy although more difficult in absolute terms. Consider the special case of only two parties, one very big and the other very small. In this case they have the same absolute error, but the smaller party will have a much larger relative spread.
I meant the absolute difference. Above 50% that would get easier as well but no party will get more than 50%, even 40% looks unlikely.
 
  • #11
mfb said:
That "other" party exists anyway, as the 6 big parties won't get all votes.
Yes, my point was that the relevant multinomial distribution is a distribution with 7 bins, not one with 6 bins.
 

Similar threads

  • · Replies 25 ·
Replies
25
Views
4K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 67 ·
3
Replies
67
Views
12K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 11 ·
Replies
11
Views
4K
  • · Replies 3 ·
Replies
3
Views
3K