Expected Value and Variance for Wilcoxon Signed-Rank Test

Click For Summary
SUMMARY

The expected value and variance for the Wilcoxon Signed-Rank Test are defined as μ = n(n+1)/4 and σ² = n(n+1)(2n+1)/24, respectively. This conclusion is derived from the properties of the test statistic W, which is a sum of ranks assigned to the differences between paired observations. Under the null hypothesis, the expected value is half the sum of the first n natural numbers, while the variance is calculated using the independence of the ranks. The discussion clarifies that pairs with zero differences are excluded from the analysis.

PREREQUISITES
  • Understanding of the Wilcoxon Signed-Rank Test
  • Familiarity with statistical concepts of expected value and variance
  • Knowledge of rank-based statistical methods
  • Basic algebra for manipulating summation formulas
NEXT STEPS
  • Study the derivation of the Wilcoxon Signed-Rank Test statistics
  • Learn about the implications of excluding zero-difference pairs in non-parametric tests
  • Explore the use of normal approximation in hypothesis testing
  • Investigate the application of z-scores in the context of Wilcoxon test results
USEFUL FOR

Statisticians, data analysts, researchers conducting non-parametric tests, and anyone interested in understanding the Wilcoxon Signed-Rank Test and its statistical properties.

Mogarrr
Messages
120
Reaction score
6
Using a normal approximation method for the Wilcoxon Signed-Rank Test, I've seen that the expected value is \mu = \frac {n(n+1)}2 and the variance is \sigma^2 = \frac {n(n+1)(2n+1)}{24}.

I'm wondering why these are the expected value and variance.

I do recognize the formula for the sum of N natural numbers and the sum of N squared natural numbers.

I have an idea as to why the expected value is half the sum of N natural numbers. Under the null hypothesis, roughly half of the differences should be positive, so it would make sense to half the sum of N natural numbers.

I have no intuition for the variance of the distribution.

An explanation would be appreciated.
 
Physics news on Phys.org
Mogarrr said:
Using a normal approximation method for the Wilcoxon Signed-Rank Test, I've seen that the expected value is \mu = \frac {n(n+1)}2 and the variance is \sigma^2 = \frac {n(n+1)(2n+1)}{24}.

I'm wondering why these are the expected value and variance.

I do recognize the formula for the sum of N natural numbers and the sum of N squared natural numbers.

I have an idea as to why the expected value is half the sum of N natural numbers. Under the null hypothesis, roughly half of the differences should be positive, so it would make sense to half the sum of N natural numbers.

I have no intuition for the variance of the distribution.

An explanation would be appreciated.

Are you sure you're right about the expected value?

The statistic used in a signed-rank test is

W=\sum_{i=1}^{n}I_iR_i

where I_i is an indicator variable defined as 0 if x_i-y_i is negative, and equal to 1 otherwise, for couples of (x_i,y_i) taken from both continuous distributions respectively describing random variables X_i and Y_i.

Now, note that

W=\sum_{i=1}^{n}I_iR_i

has the same distribution of

U=\sum_{i=1}^{n}U_i,

where P(U_i=0)=P(U_i=i)=0.5, since both W and U are sums of subsets of 1,2,...,n.

In other words, the equal chances of falling on either a negative or a positive difference are equivalent to the equal chances of being included in the sum or not.

Therefore,

E(W)=E(U)=\sum_{i=1}^{n}E(U_i)=\sum_{i=1}^{n}[0\frac{1}{2}+i\frac{1}{2}]=\frac{1}{2}\sum_{i=1}^{n}i

And we know that

\sum_{i=1}^{n}i=\frac{n(n+1)}{2},

Therefore,

E(W)=\frac{n(n+1)}{4}.

Now what would you get for the variance, working with Var(W)=Var(U) knowing the U_i are independent?

A similar work would do the trick.

In fact, the results make sense because the test statistic W ranges from a minimum of 0, if all the differences are negative, to a maximum of \frac{n(n+1)}{2}, if all the differences are positive. Since everything we're working with is symmetric (equally probably two results), then W is expected to be close to its mean, \frac{n(n+1)}{4}.
 
Last edited:
  • Like
Likes Mogarrr
Right. I wrote down the wrong number for the expected value.

So similarly, E W^2 = E U^2 = \sum_{i=1}^n 0 \cdot \frac 12 + i^2 \cdot \frac 12 = \frac 12 \sum_{i=1}^n i^2 = \frac {n(n+1)(2n+1)}{12}.

Then the variance of W is EW^2 - (EW)^2, but this quantity doesn't seem to come out to be the variance I was given.
 
Mogarrr said:
Right. I wrote down the wrong number for the expected value.

So similarly, E W^2 = E U^2 = \sum_{i=1}^n 0 \cdot \frac 12 + i^2 \cdot \frac 12 = \frac 12 \sum_{i=1}^n i^2 = \frac {n(n+1)(2n+1)}{12}.

Then the variance of W is EW^2 - (EW)^2, but this quantity doesn't seem to come out to be the variance I was given.

Be careful, there's a difference between U and U_i!

You're assuming E(U_i^2)=E(U^2) but in fact, we have that E(U)=\sum_{i=1}^{n}E(U_i) because the U_i are independent.

We should have :

Var(U_i) = E(U_i^2)-E^2(U_i) = \left(0^2 \cdot \frac 12 + i^2 \cdot \frac 12\right) - \left(\frac{1}{2}\right)^2= \frac {i^2}{2} - \left(\frac{i}{2}\right)^2 = \frac{i^2}{4}

And finally,

Var(W) = \sum_{i=1}^{n} Var(U_i) = \sum_{i=1}^{n} \frac {i^2}{4} = \frac{1}{4} \cdot \frac{n(n+1)(2n+1)}{6} = \frac{n(n+1)(2n+1)}{24}

gives us the expected result.
 
Last edited:
  • Like
Likes Mogarrr
A follow-up question. For expectation and variance of wilcoxon, for the value of n (i.e., number of pairs), do you exclude pairs in which the difference between the pairs is zero? So let's say you have 100 pairs (n = 100), but for one of the pairs, the score for the two observations is the same and thus they are excluded in determining ranks. Now if you wish to determine whether or not the obtained W is significant, you convert to a z score using (W-expW)/sqrt(varW). So again my question, in this scenario to compute expW and varW, does n = 100 or does n = 99?
 
ron_vancouver said:
A follow-up question. For expectation and variance of wilcoxon, for the value of n (i.e., number of pairs), do you exclude pairs in which the difference between the pairs is zero? So let's say you have 100 pairs (n = 100), but for one of the pairs, the score for the two observations is the same and thus they are excluded in determining ranks. Now if you wish to determine whether or not the obtained W is significant, you convert to a z score using (W-expW)/sqrt(varW). So again my question, in this scenario to compute expW and varW, does n = 100 or does n = 99?

In most applications of the Wilcoxon test, we omit from consideration the cases where the absolute difference of ##X_i## and ##Y_i## for a certain bivariate pair is zero. They provide no useful information to the procedure.
 
If there are an infinite number of natural numbers, and an infinite number of fractions in between any two natural numbers, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and... then that must mean that there are not only infinite infinities, but an infinite number of those infinities. and an infinite number of those...

Similar threads

Replies
5
Views
5K
  • · Replies 2 ·
Replies
2
Views
1K
Replies
1
Views
4K
  • · Replies 7 ·
Replies
7
Views
6K
  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 17 ·
Replies
17
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
Replies
1
Views
3K