Agreement between diagnostic tests controlled for raters

Adel Makram · Jun 8, 2015

Suppose I would like to calculate the agreement between 2 different diagnostic tests to detect a disease using 2 experienced raters. The following assumptions hold:
1) The rating of a disease is based on a categorical scale like grade I, II and so on. Therefore, it is a process of categorising the disease into different grades.
2) Non of the tests is considered as a reference.
3) The raters are independents.

My goal is to calculate the agreement between the 2 testes after controlling for the variation between the raters opinions. In other words, I seek to know whether there is a real difference between the 2 tests in categorising the disease without polluting the result because of the expected variantion in the raters opinions.

pbuk · Jun 10, 2015

You can only do this if the two scorers are both working on (subsets of) the two sets of diagnostic tests, or if you have some other data about the scorers. Or to put it another way If diagnostic X is being scored by scorer A and diagnostic Y is being scored by scorer B then you need some other data regarding the scorers.

Adel Makram · Jun 10, 2015

MrAnchovy said:

You can only do this if the two scorers are both working on (subsets of) the two sets of diagnostic tests, or if you have some other data about the scorers. Or to put it another way If diagnostic X is being scored by scorer A and diagnostic Y is being scored by scorer B then you need some other data regarding the scorers.

Both tests X and Y are scored by both raters A and B at a case to case base. In other words, all cases are scanned using X and Y, and then scored by the 2 raters ( independently). But I don`t want the variation between A and B scores to spoil the true agreement value between X and Y scans. So what to do ?
I can use the following method, each rater scores the disease and the agreement test between X and Y is measured using Cohen kappa ( or an alternative test for categorical case). Then the average kappa of the 2 raters is used finally to measure the agreement between the tests. However, this method seems to be so weak because it given an equal weighting for both raters I guess.

WWGD · Jun 10, 2015

Maybe Mann-Whitney, or Kruskal-Wallis?

Adel Makram · Jun 10, 2015

WWGD said:

Maybe Mann-Whitney, or Kruskal-Wallis?

So how can we combine the non-parametric analysis using Kruskal-Wallis with Cohen kappa that analysis the degree of agreement between raters.

thelema418 · Jun 29, 2015

If I understand this right, it sounds like you are trying to determine convergent validity of the two tests. This is how I am making sense of "agreement of the tests." Since the data is categorical, you would either use a Spearman correlation or Kendall's tau correlation to check correlation of tests.

Also from a design standpoint, I think you need more pairs of raters. Some groups should use test 1 first, and some should use test 2 first.

Agreement between diagnostic tests controlled for raters

Thread 'Onto set mapping is the surjective set mapping, and into injective?'

Thread 'Here's a Statistics problem for game of Polo (or Hockey if you like)'

Thread 'Roulette wheel physics and probability'

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

A Does this computation satisfy LTL formulas?

I Stochastic calculus: Ito's lemma and differentials

I The reason for lambda calculus being universal

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective