Expected Value and Variance for Wilcoxon Signed-Rank Test

Click For Summary

Discussion Overview

This discussion revolves around the expected value and variance of the Wilcoxon Signed-Rank Test, focusing on the derivation of these statistical measures and their implications under the null hypothesis. Participants explore both theoretical and practical aspects, including the treatment of zero differences in paired observations.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants present the expected value as \(\mu = \frac{n(n+1)}{2}\) and the variance as \(\sigma^2 = \frac{n(n+1)(2n+1)}{24}\), questioning the intuition behind these formulas.
  • One participant suggests that under the null hypothesis, roughly half of the differences should be positive, which informs their understanding of the expected value.
  • Another participant challenges the expected value calculation, proposing an alternative derivation using indicator variables and asserting that \(E(W) = \frac{n(n+1)}{4}\).
  • There is a correction regarding the expected value, with a participant noting a miscalculation and recalculating \(E(W^2)\) and the variance, but expressing uncertainty about the resulting variance compared to the given formula.
  • Participants discuss the independence of the \(U_i\) variables and the implications for calculating variance, leading to a derived variance of \(Var(W) = \frac{n(n+1)(2n+1)}{24}\).
  • A follow-up question is raised about whether to include pairs with zero differences when determining \(n\) for the expected value and variance, with one participant indicating that such pairs are typically excluded in practice.

Areas of Agreement / Disagreement

There is no consensus on the expected value, as participants present differing calculations and interpretations. The discussion on the treatment of zero differences in pairs also indicates varying practices among participants.

Contextual Notes

Participants express uncertainty regarding the assumptions made in their calculations, particularly concerning the treatment of zero differences and the independence of variables in variance calculations.

Who May Find This Useful

This discussion may be useful for statisticians, researchers conducting non-parametric tests, and students studying the Wilcoxon Signed-Rank Test and its statistical properties.

Mogarrr
Messages
120
Reaction score
6
Using a normal approximation method for the Wilcoxon Signed-Rank Test, I've seen that the expected value is \mu = \frac {n(n+1)}2 and the variance is \sigma^2 = \frac {n(n+1)(2n+1)}{24}.

I'm wondering why these are the expected value and variance.

I do recognize the formula for the sum of N natural numbers and the sum of N squared natural numbers.

I have an idea as to why the expected value is half the sum of N natural numbers. Under the null hypothesis, roughly half of the differences should be positive, so it would make sense to half the sum of N natural numbers.

I have no intuition for the variance of the distribution.

An explanation would be appreciated.
 
Physics news on Phys.org
Mogarrr said:
Using a normal approximation method for the Wilcoxon Signed-Rank Test, I've seen that the expected value is \mu = \frac {n(n+1)}2 and the variance is \sigma^2 = \frac {n(n+1)(2n+1)}{24}.

I'm wondering why these are the expected value and variance.

I do recognize the formula for the sum of N natural numbers and the sum of N squared natural numbers.

I have an idea as to why the expected value is half the sum of N natural numbers. Under the null hypothesis, roughly half of the differences should be positive, so it would make sense to half the sum of N natural numbers.

I have no intuition for the variance of the distribution.

An explanation would be appreciated.

Are you sure you're right about the expected value?

The statistic used in a signed-rank test is

W=\sum_{i=1}^{n}I_iR_i

where I_i is an indicator variable defined as 0 if x_i-y_i is negative, and equal to 1 otherwise, for couples of (x_i,y_i) taken from both continuous distributions respectively describing random variables X_i and Y_i.

Now, note that

W=\sum_{i=1}^{n}I_iR_i

has the same distribution of

U=\sum_{i=1}^{n}U_i,

where P(U_i=0)=P(U_i=i)=0.5, since both W and U are sums of subsets of 1,2,...,n.

In other words, the equal chances of falling on either a negative or a positive difference are equivalent to the equal chances of being included in the sum or not.

Therefore,

E(W)=E(U)=\sum_{i=1}^{n}E(U_i)=\sum_{i=1}^{n}[0\frac{1}{2}+i\frac{1}{2}]=\frac{1}{2}\sum_{i=1}^{n}i

And we know that

\sum_{i=1}^{n}i=\frac{n(n+1)}{2},

Therefore,

E(W)=\frac{n(n+1)}{4}.

Now what would you get for the variance, working with Var(W)=Var(U) knowing the U_i are independent?

A similar work would do the trick.

In fact, the results make sense because the test statistic W ranges from a minimum of 0, if all the differences are negative, to a maximum of \frac{n(n+1)}{2}, if all the differences are positive. Since everything we're working with is symmetric (equally probably two results), then W is expected to be close to its mean, \frac{n(n+1)}{4}.
 
Last edited:
  • Like
Likes   Reactions: Mogarrr
Right. I wrote down the wrong number for the expected value.

So similarly, E W^2 = E U^2 = \sum_{i=1}^n 0 \cdot \frac 12 + i^2 \cdot \frac 12 = \frac 12 \sum_{i=1}^n i^2 = \frac {n(n+1)(2n+1)}{12}.

Then the variance of W is EW^2 - (EW)^2, but this quantity doesn't seem to come out to be the variance I was given.
 
Mogarrr said:
Right. I wrote down the wrong number for the expected value.

So similarly, E W^2 = E U^2 = \sum_{i=1}^n 0 \cdot \frac 12 + i^2 \cdot \frac 12 = \frac 12 \sum_{i=1}^n i^2 = \frac {n(n+1)(2n+1)}{12}.

Then the variance of W is EW^2 - (EW)^2, but this quantity doesn't seem to come out to be the variance I was given.

Be careful, there's a difference between U and U_i!

You're assuming E(U_i^2)=E(U^2) but in fact, we have that E(U)=\sum_{i=1}^{n}E(U_i) because the U_i are independent.

We should have :

Var(U_i) = E(U_i^2)-E^2(U_i) = \left(0^2 \cdot \frac 12 + i^2 \cdot \frac 12\right) - \left(\frac{1}{2}\right)^2= \frac {i^2}{2} - \left(\frac{i}{2}\right)^2 = \frac{i^2}{4}

And finally,

Var(W) = \sum_{i=1}^{n} Var(U_i) = \sum_{i=1}^{n} \frac {i^2}{4} = \frac{1}{4} \cdot \frac{n(n+1)(2n+1)}{6} = \frac{n(n+1)(2n+1)}{24}

gives us the expected result.
 
Last edited:
  • Like
Likes   Reactions: Mogarrr
A follow-up question. For expectation and variance of wilcoxon, for the value of n (i.e., number of pairs), do you exclude pairs in which the difference between the pairs is zero? So let's say you have 100 pairs (n = 100), but for one of the pairs, the score for the two observations is the same and thus they are excluded in determining ranks. Now if you wish to determine whether or not the obtained W is significant, you convert to a z score using (W-expW)/sqrt(varW). So again my question, in this scenario to compute expW and varW, does n = 100 or does n = 99?
 
ron_vancouver said:
A follow-up question. For expectation and variance of wilcoxon, for the value of n (i.e., number of pairs), do you exclude pairs in which the difference between the pairs is zero? So let's say you have 100 pairs (n = 100), but for one of the pairs, the score for the two observations is the same and thus they are excluded in determining ranks. Now if you wish to determine whether or not the obtained W is significant, you convert to a z score using (W-expW)/sqrt(varW). So again my question, in this scenario to compute expW and varW, does n = 100 or does n = 99?

In most applications of the Wilcoxon test, we omit from consideration the cases where the absolute difference of ##X_i## and ##Y_i## for a certain bivariate pair is zero. They provide no useful information to the procedure.
 

Similar threads

  • · Replies 2 ·
Replies
2
Views
1K
Replies
5
Views
6K
Replies
1
Views
4K
  • · Replies 7 ·
Replies
7
Views
6K
  • · Replies 14 ·
Replies
14
Views
2K
Replies
1
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 20 ·
Replies
20
Views
4K
  • · Replies 5 ·
Replies
5
Views
2K