Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Expected Value and Variance for Wilcoxon Signed-Rank Test

  1. Oct 11, 2014 #1
    Using a normal approximation method for the Wilcoxon Signed-Rank Test, I've seen that the expected value is [itex] \mu = \frac {n(n+1)}2 [/itex] and the variance is [itex] \sigma^2 = \frac {n(n+1)(2n+1)}{24} [/itex].

    I'm wondering why these are the expected value and variance.

    I do recognize the formula for the sum of N natural numbers and the sum of N squared natural numbers.

    I have an idea as to why the expected value is half the sum of N natural numbers. Under the null hypothesis, roughly half of the differences should be positive, so it would make sense to half the sum of N natural numbers.

    I have no intuition for the variance of the distribution.

    An explanation would be appreciated.
  2. jcsd
  3. Oct 13, 2014 #2
    Are you sure you're right about the expected value?

    The statistic used in a signed-rank test is


    where [itex]I_i[/itex] is an indicator variable defined as [itex]0[/itex] if [itex]x_i-y_i[/itex] is negative, and equal to [itex]1[/itex] otherwise, for couples of [itex](x_i,y_i)[/itex] taken from both continuous distributions respectively describing random variables [itex]X_i[/itex] and [itex]Y_i[/itex].

    Now, note that


    has the same distribution of


    where [itex]P(U_i=0)=P(U_i=i)=0.5[/itex], since both [itex]W[/itex] and [itex]U[/itex] are sums of subsets of [itex]1,2,...,n[/itex].

    In other words, the equal chances of falling on either a negative or a positive difference are equivalent to the equal chances of being included in the sum or not.



    And we know that




    Now what would you get for the variance, working with [itex]Var(W)=Var(U)[/itex] knowing the [itex]U_i[/itex] are independent?

    A similar work would do the trick.

    In fact, the results make sense because the test statistic [itex]W[/itex] ranges from a minimum of [itex]0[/itex], if all the differences are negative, to a maximum of [itex]\frac{n(n+1)}{2}[/itex], if all the differences are positive. Since everything we're working with is symmetric (equally probably two results), then [itex]W[/itex] is expected to be close to its mean, [itex]\frac{n(n+1)}{4}[/itex].
    Last edited: Oct 13, 2014
  4. Oct 15, 2014 #3
    Right. I wrote down the wrong number for the expected value.

    So similarly, [itex] E W^2 = E U^2 = \sum_{i=1}^n 0 \cdot \frac 12 + i^2 \cdot \frac 12 = \frac 12 \sum_{i=1}^n i^2 = \frac {n(n+1)(2n+1)}{12}[/itex].

    Then the variance of W is [itex] EW^2 - (EW)^2 [/itex], but this quantity doesn't seem to come out to be the variance I was given.
  5. Oct 15, 2014 #4
    Be careful, there's a difference between [itex]U[/itex] and [itex]U_i[/itex]!

    You're assuming [itex]E(U_i^2)=E(U^2)[/itex] but in fact, we have that [itex]E(U)=\sum_{i=1}^{n}E(U_i)[/itex] because the [itex]U_i[/itex] are independent.

    We should have :

    [itex]Var(U_i) = E(U_i^2)-E^2(U_i) = \left(0^2 \cdot \frac 12 + i^2 \cdot \frac 12\right) - \left(\frac{1}{2}\right)^2= \frac {i^2}{2} - \left(\frac{i}{2}\right)^2 = \frac{i^2}{4}[/itex]

    And finally,

    [itex]Var(W) = \sum_{i=1}^{n} Var(U_i) = \sum_{i=1}^{n} \frac {i^2}{4} = \frac{1}{4} \cdot \frac{n(n+1)(2n+1)}{6} = \frac{n(n+1)(2n+1)}{24}[/itex]

    gives us the expected result.
    Last edited: Oct 15, 2014
  6. Jan 6, 2015 #5
    A follow-up question. For expectation and variance of wilcoxon, for the value of n (i.e., number of pairs), do you exclude pairs in which the difference between the pairs is zero? So let's say you have 100 pairs (n = 100), but for one of the pairs, the score for the two observations is the same and thus they are excluded in determining ranks. Now if you wish to determine whether or not the obtained W is significant, you convert to a z score using (W-expW)/sqrt(varW). So again my question, in this scenario to compute expW and varW, does n = 100 or does n = 99?
  7. Feb 4, 2015 #6
    In most applications of the Wilcoxon test, we omit from consideration the cases where the absolute difference of ##X_i## and ##Y_i## for a certain bivariate pair is zero. They provide no useful information to the procedure.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook