Agreement between diagnostic tests controlled for raters

  • Context: Graduate 
  • Thread starter Thread starter Adel Makram
  • Start date Start date
Click For Summary

Discussion Overview

The discussion centers on calculating the agreement between two diagnostic tests for detecting a disease, as assessed by two independent raters. The focus is on controlling for the variation in the raters' opinions while determining if there is a significant difference in how the tests categorize the disease. The conversation includes considerations of statistical methods applicable to categorical data.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant outlines the need to calculate agreement while controlling for rater variation, emphasizing that neither test is a reference.
  • Another participant suggests that agreement can only be assessed if both raters evaluate the same cases or if additional data about the raters is available.
  • A subsequent post clarifies that both tests are scored by both raters independently, but expresses concern that this may affect the true agreement value.
  • A method involving Cohen's kappa is proposed to measure agreement, though one participant expresses skepticism about its effectiveness due to equal weighting of raters.
  • Some participants propose non-parametric tests like Mann-Whitney or Kruskal-Wallis as alternatives for analysis.
  • There is a suggestion to combine non-parametric analysis with Cohen's kappa to assess agreement between the tests.
  • One participant interprets the discussion as an exploration of convergent validity and suggests using Spearman or Kendall's tau correlations for categorical data.
  • Concerns are raised about the design of the study, specifically the need for more pairs of raters and varying the order in which tests are administered.

Areas of Agreement / Disagreement

Participants express differing views on the appropriate methods for assessing agreement and the design of the study. There is no consensus on the best approach, and multiple competing methods are proposed.

Contextual Notes

Participants highlight limitations related to the independence of raters, the need for additional data, and the implications of using different statistical methods for categorical data analysis.

Adel Makram
Messages
632
Reaction score
15
Suppose I would like to calculate the agreement between 2 different diagnostic tests to detect a disease using 2 experienced raters. The following assumptions hold:
1) The rating of a disease is based on a categorical scale like grade I, II and so on. Therefore, it is a process of categorising the disease into different grades.
2) Non of the tests is considered as a reference.
3) The raters are independents.

My goal is to calculate the agreement between the 2 testes after controlling for the variation between the raters opinions. In other words, I seek to know whether there is a real difference between the 2 tests in categorising the disease without polluting the result because of the expected variantion in the raters opinions.
 
Physics news on Phys.org
You can only do this if the two scorers are both working on (subsets of) the two sets of diagnostic tests, or if you have some other data about the scorers. Or to put it another way If diagnostic X is being scored by scorer A and diagnostic Y is being scored by scorer B then you need some other data regarding the scorers.
 
MrAnchovy said:
You can only do this if the two scorers are both working on (subsets of) the two sets of diagnostic tests, or if you have some other data about the scorers. Or to put it another way If diagnostic X is being scored by scorer A and diagnostic Y is being scored by scorer B then you need some other data regarding the scorers.
Both tests X and Y are scored by both raters A and B at a case to case base. In other words, all cases are scanned using X and Y, and then scored by the 2 raters ( independently). But I don`t want the variation between A and B scores to spoil the true agreement value between X and Y scans. So what to do ?
I can use the following method, each rater scores the disease and the agreement test between X and Y is measured using Cohen kappa ( or an alternative test for categorical case). Then the average kappa of the 2 raters is used finally to measure the agreement between the tests. However, this method seems to be so weak because it given an equal weighting for both raters I guess.
 
Last edited:
Maybe Mann-Whitney, or Kruskal-Wallis?
 
WWGD said:
Maybe Mann-Whitney, or Kruskal-Wallis?
So how can we combine the non-parametric analysis using Kruskal-Wallis with Cohen kappa that analysis the degree of agreement between raters.
 
If I understand this right, it sounds like you are trying to determine convergent validity of the two tests. This is how I am making sense of "agreement of the tests." Since the data is categorical, you would either use a Spearman correlation or Kendall's tau correlation to check correlation of tests.

Also from a design standpoint, I think you need more pairs of raters. Some groups should use test 1 first, and some should use test 2 first.
 

Similar threads

  • · Replies 24 ·
Replies
24
Views
3K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 0 ·
Replies
0
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 0 ·
Replies
0
Views
370
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K