Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Variance of statistic used in runs test

  1. Jul 30, 2013 #1
    Hi, friends! Since this is my first post, I want to present myself as an Italian who is trying to teach himself mathematics and natural sciences, while having a strictly humanities-centered school background, and I am tempted very much to enrol in a university scientific course.
    I read in the Italian language Wikipedia that the variance [itex]\text{Var}_{H_0}(R)[/itex] of the statistic [itex]R[/itex] used in the Wald-Wolfowitz test, under the null hypothesis that the [itex]X_1,...,X_n[/itex] are independent, is[tex](4N-6)p(1-p)-(12N-20)p^2(1-p)^2.[/tex] It is worth to notice that, when discussing the test, it is common to give what I think to be approximations used in the case of the Gaussian approximation of [itex]R[/itex], rather than the real expectation and variance...
    That statistic, as my book (S.M. Ross, Introduction to Probability and Statistics for Engineers and Scientists) explains, and as I find in the German language Wikipedia too, has the probability mass function[tex]P_{H_0}(R=2k)=2\frac{\binom{N^+ -1}{k-1}\binom{N^- -1}{k-1}}{\binom{N^+ +N^-}{n}}[/tex][tex]P_{H_0}(R=2k+1)=\frac{\binom{N^+ -1}{k-1}\binom{N^- -1}{k}+\binom{N^+ -1}{k}\binom{N^- -1}{k-1}}{\binom{N^+ +N^-}{n}}[/tex]
    The Italian language Wikipedia considers the statistic [itex]R[/itex] to be the same, under the null hypothesis of independence, as [itex]1+\sum_{i=1}^{N-1}|X_i-X_{i+1}|[/itex] where the [itex]X_i[/itex] are Bernoulli random variables with expectation [itex]p[/itex], and the expectation [itex]E_{H_0}[R][/itex] given in that Wikipedia is the same talked about by a user here, who gives a short proof of the value of [itex]E_{H_0}[R]=1+2(N-1)p(1-p)[/itex].

    As to variance, I have tried a lot but my efforts to prove it by myself have been useless. I have tried to calculate the second moment by manipulating the sums [itex]\sum_{k=1}^{\min\{N^+ ,N^-\}}(4k^2 P_{H_0}(R=2k)+(2k+1)^2P_{H_0}(R=2k+1))[/itex] if [itex]N^+ \ne N^-[/itex] and by similarly treating the case [itex]N^+ =N^-[/itex] where I would say that the second moment is [itex]E_{H_0}[R^2]=\sum_{k=1}^{N^+ -1}(4k^2 P_{H_0}(R=2k)+(2k+1)^2 P_{H_0}(R=2k+1))+\frac{2(N^+)^2}{\binom{2N^+}{N^+}}[/itex], but I haven't been to simplify those sums with their factorials.
    Does anybody knows or can link a proof of the formula for the variance [itex]\text{Var}_{H_0}(R)[/itex]?
    I [itex]\infty[/itex]-ly thank you all!!!
    Last edited: Jul 30, 2013
  2. jcsd
  3. Aug 3, 2013 #2

    Stephen Tashi

    User Avatar
    Science Advisor

    One thought is to let [itex] Y_i = |X_i - X_{i+1}| [/itex].

    Only consecutive [itex]Y's[/itex] are not independent, so

    [tex] Var( 1 + \sum_{i=1}^{N-1} Y_i ) = \sum_{i=1}^{N-1} Var(Y_i) + 2 \sum_{i=1}^{N-2} Cov(Y_i,Y_{i+1}) [/tex]

    Then you have to find formula for [itex] Var(Y_i) [/itex] and [itex] Cov(Y_i,Y_{i+1}) [/itex]. That might not be easy, but iat least it focuses our attention on only three of the [itex] X's [/itex] at a time.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook