# Agreement between diagnostic tests controlled for raters

• Adel Makram
In summary, the goal of this conversation is to calculate the agreement between 2 different diagnostic tests to detect a disease. The goal is to find out whether there is a real difference between the 2 tests in categorising the disease without polluting the result because of the expected variantion in the raters opinions. The method that is being used is to use Cohen kappa to measure the degree of agreement between raters.
Adel Makram
Suppose I would like to calculate the agreement between 2 different diagnostic tests to detect a disease using 2 experienced raters. The following assumptions hold:
1) The rating of a disease is based on a categorical scale like grade I, II and so on. Therefore, it is a process of categorising the disease into different grades.
2) Non of the tests is considered as a reference.
3) The raters are independents.

My goal is to calculate the agreement between the 2 testes after controlling for the variation between the raters opinions. In other words, I seek to know whether there is a real difference between the 2 tests in categorising the disease without polluting the result because of the expected variantion in the raters opinions.

You can only do this if the two scorers are both working on (subsets of) the two sets of diagnostic tests, or if you have some other data about the scorers. Or to put it another way If diagnostic X is being scored by scorer A and diagnostic Y is being scored by scorer B then you need some other data regarding the scorers.

MrAnchovy said:
You can only do this if the two scorers are both working on (subsets of) the two sets of diagnostic tests, or if you have some other data about the scorers. Or to put it another way If diagnostic X is being scored by scorer A and diagnostic Y is being scored by scorer B then you need some other data regarding the scorers.
Both tests X and Y are scored by both raters A and B at a case to case base. In other words, all cases are scanned using X and Y, and then scored by the 2 raters ( independently). But I don`t want the variation between A and B scores to spoil the true agreement value between X and Y scans. So what to do ?
I can use the following method, each rater scores the disease and the agreement test between X and Y is measured using Cohen kappa ( or an alternative test for categorical case). Then the average kappa of the 2 raters is used finally to measure the agreement between the tests. However, this method seems to be so weak because it given an equal weighting for both raters I guess.

Last edited:
Maybe Mann-Whitney, or Kruskal-Wallis?

WWGD said:
Maybe Mann-Whitney, or Kruskal-Wallis?
So how can we combine the non-parametric analysis using Kruskal-Wallis with Cohen kappa that analysis the degree of agreement between raters.

If I understand this right, it sounds like you are trying to determine convergent validity of the two tests. This is how I am making sense of "agreement of the tests." Since the data is categorical, you would either use a Spearman correlation or Kendall's tau correlation to check correlation of tests.

Also from a design standpoint, I think you need more pairs of raters. Some groups should use test 1 first, and some should use test 2 first.

## What is an "Agreement between diagnostic tests controlled for raters"?

An "Agreement between diagnostic tests controlled for raters" refers to a statistical measure that evaluates the level of agreement or consistency between two or more diagnostic tests that are being used to evaluate a particular condition or disease. This measure takes into account the presence of multiple raters, who may introduce bias or variability in the test results.

## Why is it important to control for raters in diagnostic test agreement?

Controlling for raters in diagnostic test agreement is crucial because it helps to eliminate or reduce the influence of rater bias and variability on the test results. This ensures that the agreement between the tests is based on the actual performance of the tests and not on the individual preferences or tendencies of the raters.

## How is the agreement between diagnostic tests controlled for raters calculated?

The agreement between diagnostic tests controlled for raters is typically calculated using a statistical measure called the kappa coefficient. This coefficient takes into account the observed agreement between the tests as well as the expected agreement due to chance. A higher kappa value indicates a higher level of agreement between the tests.

## What does a high agreement between diagnostic tests controlled for raters indicate?

A high agreement between diagnostic tests controlled for raters indicates that the two tests are producing similar results and are therefore consistent in their evaluation of the condition or disease. This can provide greater confidence in the accuracy and reliability of the test results.

## Are there any limitations to using agreement between diagnostic tests controlled for raters?

Yes, there are some limitations to using agreement between diagnostic tests controlled for raters. This measure does not take into account the severity of the disease or condition being evaluated, and may not be applicable for rare diseases or conditions with low prevalence. Additionally, it may not be suitable for evaluating tests that have a continuous scale of measurement.

Replies
2
Views
3K
Replies
0
Views
573
Replies
5
Views
2K
Replies
9
Views
2K
Replies
5
Views
2K
Replies
7
Views
13K
Replies
4
Views
886
Replies
27
Views
3K
Replies
10
Views
3K
Replies
1
Views
1K