# Normalization/Scaling of Competition Scores

bigredhockey
Hi,

I was put in charge of coming up with a normalization/scaling scheme for a competition in which competitors are scored 0-100 by 6 judges and then the high and the low are dropped before the cumulative score is calculated. In past years we have found scores from judge to judge are not consistent on an average and range basis and thus the request to implement some sort of normalization.

My first instinct was to go with a straight normalization to z-scores and then re-scale those to an arbitrary mean and standard deviation. A colleague of mine suggested the following equation:

[("normalized score") = (raw - low)/(high - low) * X ] where X is the arbitrary top score (i.e. top score is converted to 100 if X=100) and the lowest raw score would convert to 0.

I'm struggling with determining which would be the better way to go about this as well as explaining one vs the other to the executive board of the competition.

Do you guys have insight as to the differences/pros/cons to each method?

bigredhockey

Hey bigredhockey and welcome to the forums.

You can use a re-scaled normal distribution, but you have to be aware that the normal distribution covers the whole real line and not just a fixed interval.

So what you will want to do is treat say a 95% interval or a 99% interval as corresponding to 100% interval. So you will use a distribution that has a higher peak than what even the normal re-scaled distribution is.

So as an example lets say your re-scaled distribution is a standard normal. Let's say your new 100% interval will correspond to a 95% interval that is roughly [-1.96,1.96].

Then your new PDF will be the f(x)*(1/0.95) = f(x)/0.95 where f(x) is the PDF for the standard normal.

In terms of your normalization scheme I assume you want your normalized score to be between 0 and 100 with the cumulative scores when you throw out the high and the low ones.

The best question I think to answer your own question is are you doing this chucking out process per player and then calculate a score on a per player basis or are you chucking out on the high and low scores for all players that have been given from all judges?

If you do the latter then you can do the kind of thing you are suggesting with the addition of my little hint above if you are using a standard normal distribution (or some other continuous distribution across the real line).

This is a very good and simple approach, but it does not take into account the nature of the scores that are thrown out based on what they are (the range and distribution), who threw them out (i.e. the judges, their history, and so on) and also any other biases inherent in the scoring process (bias for high score or bias against).

If you for example get a judge that rarely gives anyone, no matter how good they are a good score, then it's a lot better to look at the judge rather than the numbers themselves.

Similarly if the distribution of high or low scores has a tight standard deviation for scores over a long enough period of time and then you get someone that has an extremely high or extremely low score, then that should be questioned.

Also in regard to the above, you want to know how many people are doing this (i.e. giving extremely high or low scores).

Finally you have bias: it may be substantiated or un-substantiated. You could have a poor performer and the scores are all miserably low. Maybe you have this case with the exception of 1 which gives an extremely high score. In this situation the high score might be justified and the low ones not or the reverse (where the high one is really going to make you scratch your head).

You also have to look for subtle biases like who is performing, where they come from and so on: these do really strange things to judgement and they can be detected statistically.

In short, the point I'm getting at is to look what whatever kind of bias you think is important for normalization. You have suggested chucking out the extremes, and I have given a few extra ideas for where bias can come in and how it can be analyzed and detected in a statistical context (i.e. under uncertainty).