# Algorithm to create a composite score

Hi everyone!

This is an application question. I would like to get some advice about how to calculate a score based on a set of individual scores in a way that makes most sense.

CONTEXT:
I am going over some criteria for judging usability of hypotheses. I came up with a whole bunch about a certain topic and now I'm trying to select the best ones. I asked a number of people to evaluate these hypotheses on the criteria below.

A) hypothesis is specific
B) hypothesis is verifiable
C) hypothesis has a strong theoretical foundation
D) hypothesis can be tested using available resources
...
Let's say people evaluated these on a 10 point scale with 10 being the best.
I want one score based on all four criteria above. The easiest way would be to just add the mean individual scores. For example, if average ratings of a given hypothesis were 7,8,9, and 10 for criteria A, B, C, and D respectively, then the hypothesis would get the score of 34. But I wonder if addition would make sense. Here are some potential challenges:

1) if a hypothesis cannot be tested using available resources (criterion 4), then no matter how highly I evaluate points 1-3, I cannot use it. Such hypothesis could score higher than an alternative which was evaluated less highly on criteria A-D, but highly on criterion 4.
2) some of the criteria are highly corelated with each other. For example criteria 2 and 3
will be more highly correlated with each other than criteria B and C.
3) even though there may be a nearly perfect correlation between some criteria across the different hypotheses, conceptually these are different. So, averaging scores based on highly correlated criteria would not make sense.

How would you address the challenges above?
What are better alternative ways of obtaining a single score (alternative to A+B+C+D)?

I will greatly appreciate your help!

Last edited by a moderator:

mfb
Mentor
No matter what you do it will be quite arbitrary.

Taking the product would heavily disfavor hypotheses that score poorly in one category.
Summing the square root (or some similar function) of the individual scores will also make low ratings more important, but not as much as the product.
You could introduce special rules like "something rated less than X in D cannot get an overall score better than Y"

Stephen Tashi
in a way that makes most sense.

The underlying problem is to define what you mean my "makes sense" - and if you actually mean "most" then you need some way to compare two ways of making a scale and deciding which one makes more sense.

If the goal is create numerical scale that reflects your own subjective judgements, then we have defined a specific problem - your particular judgements may not interest a lot of people, but at least it is a specific problem and the general idea of the problem is interesting.

One way to formulate a general case is as follows: We are given a list ##L## that orders N things from "best" to "worst". Never mind how this ordering is created - it's just a "given". Each of the N things is rated on M different aspects. We want to find a real valued function ##f(a_{k,1},a_{k,2},..a_{k,M})## defined on the M aspects of each ##k##-th thing such that the values of ##f## reflect the ordering given in the list ##L## - i.e. ## f(a_{i,1},...) > f(a_{j,1},..) ## if and only if thing ##i## is better than thing ##j## according to the list ##L##.

There are probably many ways to create a function ##f## that agrees with the ordering of list ##L##. However, the mathematical aspects of the problem are still interesting because we can see simple ways to create ##f##. The basic decision is whether you want to solve a mathematical problem or whether you want to discuss the somewhat philosophical question of what makes a hypothesis useful. If you want to discuss what makes scientific hypotheses useful then it would be better to do that in a section of the forum devoted to science.

Simon Bridge
Homework Helper
Note: each of your evaluation criteria are boolean, the hypothesis either meets the criteria or it does not. This is especially clear for D, where the hypothesis can either be evaluated with available resources or it cannot. I mean: if someone scored that a 4 or a 5 on the 1-10 scale, what would that mean?

Before you can get a proper composite score here you need to be clear about your metric.

To get a score on a range, then it must be possible for the thing being scored to exist on a range ... so A: you want to rank how specific the hypothesis is, not whether it is specific or not. (But how would you rank the specificness of a hypothesis ... can you give an example where something can score a 5/10 for being specific, compared with a 1/10 and a 10/10?)

As you have written it, it makes more sense to use a binary scale. Score 0 or 1

Then final score S=D(A+B+C) will satisfy the requirement that the highest score is best and a fail for D gives you S=0, for a fail overall.

In general, you will have to come up with a composite score where the different components have different "weight".
ie. when evaluating what sort of cars to buy for a company fleet, purchase price will be more important than comfort, with things like mileage and maintenance costs coming in between.

In those situations, a simple mean will not give a useful score.
The way to deal with this is to give each criteria a "weight value" as well... then you multiply the rating each criteria gets by the weighting... then take the average.

Example:

Consider: evaluating for a second date ... I may rate the subject on:
A. woman (= convincing cis female human)
B. smile
C. sense of humour
D. witty
E. bust wow factor
F. sluttiness
G. education
H. 1st impressions
I have a Y chromosome so sue me :P

I'll rate A as 0 or 1... all else out of 10.
Making myself out to be even more shallow... I could run the calculation as follows:

S = A[3B + 7C + D + 9E + 10F + 6G + 5H]/7 ... which will give a score out of 10.
The letters are the rating of each criteria, the numbers in front are the weightings (how important each one is to the evaluation).

Mind you... I may prefer something more like:
S = A[6B + 10C + 3D + E + 2F + 6G + H]/7

The point is to illustrate how flexible this way of doing things is.

You don't have to do a straight weighted average either ... ie. if you don't want outliers to have an undue influence, you can sum the squares and take the square root. You don't even have to use linear scales. For now though, this will give you the idea.

What I want you to take away from this is that you need to make sure the scoring of each criteria makes sense.
Can you reword your criteria so that they are not binary?

jim mcnamara
mfb
Mentor
Note: each of your evaluation criteria are boolean, the hypothesis either meets the criteria or it does not. This is especially clear for D, where the hypothesis can either be evaluated with available resources or it cannot. I mean: if someone scored that a 4 or a 5 on the 1-10 scale, what would that mean?
10 means you can test it at home, 0 means it would need more than the gross world product of 10 years, 4-5 is something a national lab could do?

Simon Bridge
Simon Bridge
Homework Helper
10 means you can test it at home, 0 means it would need more than the gross world product of 10 years, 4-5 is something a national lab could do?
Sure ... now lets see what OP had in mind.