Expected Value and Variance for Wilcoxon Signed-Rank Test

Mogarrr · Oct 11, 2014

Using a normal approximation method for the Wilcoxon Signed-Rank Test, I've seen that the expected value is \mu = \frac {n(n+1)}2 and the variance is \sigma^2 = \frac {n(n+1)(2n+1)}{24}.

I'm wondering why these are the expected value and variance.

I do recognize the formula for the sum of N natural numbers and the sum of N squared natural numbers.

I have an idea as to why the expected value is half the sum of N natural numbers. Under the null hypothesis, roughly half of the differences should be positive, so it would make sense to half the sum of N natural numbers.

I have no intuition for the variance of the distribution.

An explanation would be appreciated.

h6ss · Oct 13, 2014

Mogarrr said:

Using a normal approximation method for the Wilcoxon Signed-Rank Test, I've seen that the expected value is \mu = \frac {n(n+1)}2 and the variance is \sigma^2 = \frac {n(n+1)(2n+1)}{24}.

I'm wondering why these are the expected value and variance.

I do recognize the formula for the sum of N natural numbers and the sum of N squared natural numbers.

I have an idea as to why the expected value is half the sum of N natural numbers. Under the null hypothesis, roughly half of the differences should be positive, so it would make sense to half the sum of N natural numbers.

I have no intuition for the variance of the distribution.

An explanation would be appreciated.

Are you sure you're right about the expected value?

The statistic used in a signed-rank test is

W=\sum_{i=1}^{n}I_iR_i

where I_i is an indicator variable defined as 0 if x_i-y_i is negative, and equal to 1 otherwise, for couples of (x_i,y_i) taken from both continuous distributions respectively describing random variables X_i and Y_i.

Now, note that

W=\sum_{i=1}^{n}I_iR_i

has the same distribution of

U=\sum_{i=1}^{n}U_i,

where P(U_i=0)=P(U_i=i)=0.5, since both W and U are sums of subsets of 1,2,...,n.

In other words, the equal chances of falling on either a negative or a positive difference are equivalent to the equal chances of being included in the sum or not.

Therefore,

E(W)=E(U)=\sum_{i=1}^{n}E(U_i)=\sum_{i=1}^{n}[0\frac{1}{2}+i\frac{1}{2}]=\frac{1}{2}\sum_{i=1}^{n}i

And we know that

\sum_{i=1}^{n}i=\frac{n(n+1)}{2},

Therefore,

E(W)=\frac{n(n+1)}{4}.

Now what would you get for the variance, working with Var(W)=Var(U) knowing the U_i are independent?

A similar work would do the trick.

In fact, the results make sense because the test statistic W ranges from a minimum of 0, if all the differences are negative, to a maximum of \frac{n(n+1)}{2}, if all the differences are positive. Since everything we're working with is symmetric (equally probably two results), then W is expected to be close to its mean, \frac{n(n+1)}{4}.

Mogarrr · Oct 15, 2014

Right. I wrote down the wrong number for the expected value.

So similarly, E W^2 = E U^2 = \sum_{i=1}^n 0 \cdot \frac 12 + i^2 \cdot \frac 12 = \frac 12 \sum_{i=1}^n i^2 = \frac {n(n+1)(2n+1)}{12}.

Then the variance of W is EW^2 - (EW)^2, but this quantity doesn't seem to come out to be the variance I was given.

h6ss · Oct 15, 2014

Mogarrr said:

Right. I wrote down the wrong number for the expected value.

So similarly, E W^2 = E U^2 = \sum_{i=1}^n 0 \cdot \frac 12 + i^2 \cdot \frac 12 = \frac 12 \sum_{i=1}^n i^2 = \frac {n(n+1)(2n+1)}{12}.

Then the variance of W is EW^2 - (EW)^2, but this quantity doesn't seem to come out to be the variance I was given.

Be careful, there's a difference between U and U_i!

You're assuming E(U_i^2)=E(U^2) but in fact, we have that E(U)=\sum_{i=1}^{n}E(U_i) because the U_i are independent.

We should have :

Var(U_i) = E(U_i^2)-E^2(U_i) = \left(0^2 \cdot \frac 12 + i^2 \cdot \frac 12\right) - \left(\frac{1}{2}\right)^2= \frac {i^2}{2} - \left(\frac{i}{2}\right)^2 = \frac{i^2}{4}

And finally,

Var(W) = \sum_{i=1}^{n} Var(U_i) = \sum_{i=1}^{n} \frac {i^2}{4} = \frac{1}{4} \cdot \frac{n(n+1)(2n+1)}{6} = \frac{n(n+1)(2n+1)}{24}

gives us the expected result.

ron_vancouver · Jan 6, 2015

A follow-up question. For expectation and variance of wilcoxon, for the value of n (i.e., number of pairs), do you exclude pairs in which the difference between the pairs is zero? So let's say you have 100 pairs (n = 100), but for one of the pairs, the score for the two observations is the same and thus they are excluded in determining ranks. Now if you wish to determine whether or not the obtained W is significant, you convert to a z score using (W-expW)/sqrt(varW). So again my question, in this scenario to compute expW and varW, does n = 100 or does n = 99?

h6ss · Feb 4, 2015

ron_vancouver said:

A follow-up question. For expectation and variance of wilcoxon, for the value of n (i.e., number of pairs), do you exclude pairs in which the difference between the pairs is zero? So let's say you have 100 pairs (n = 100), but for one of the pairs, the score for the two observations is the same and thus they are excluded in determining ranks. Now if you wish to determine whether or not the obtained W is significant, you convert to a z score using (W-expW)/sqrt(varW). So again my question, in this scenario to compute expW and varW, does n = 100 or does n = 99?

In most applications of the Wilcoxon test, we omit from consideration the cases where the absolute difference of ##X_i## and ##Y_i## for a certain bivariate pair is zero. They provide no useful information to the procedure.

Expected Value and Variance for Wilcoxon Signed-Rank Test

Thread 'My basic understanding of set theory'

Similar threads

Undergrad A variant of the Monty Hall problem

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

High School How Rare Is Low Smartphone Usage Among Metro Travelers in Japan?

High School Onto set mapping is the surjective set mapping, and into injective?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers