Variance of statistic used in runs test

Click For Summary
SUMMARY

The variance of the statistic R used in the Wald-Wolfowitz test under the null hypothesis of independence is defined as \text{Var}_{H_0}(R) = (4N-6)p(1-p) - (12N-20)p^2(1-p)^2. The probability mass function for R is given by P_{H_0}(R=2k) and P_{H_0}(R=2k+1), which involve binomial coefficients. The discussion highlights the difficulty in proving the variance formula and suggests focusing on the variance and covariance of the derived random variables Y_i = |X_i - X_{i+1}| to simplify the calculations.

PREREQUISITES
  • Understanding of the Wald-Wolfowitz test and its application in statistics.
  • Familiarity with Bernoulli random variables and their properties.
  • Knowledge of probability mass functions and binomial coefficients.
  • Basic concepts of variance and covariance in probability theory.
NEXT STEPS
  • Study the derivation of the variance formula for the Wald-Wolfowitz test.
  • Learn about covariance and its application in dependent random variables.
  • Explore advanced topics in probability theory, focusing on moment-generating functions.
  • Investigate statistical software tools for performing Wald-Wolfowitz tests and calculating variances.
USEFUL FOR

Statisticians, data scientists, and students of probability theory who are interested in understanding the Wald-Wolfowitz test and its statistical properties.

DavideGenoa
Messages
151
Reaction score
5
Hi, friends! Since this is my first post, I want to present myself as an Italian who is trying to teach himself mathematics and natural sciences, while having a strictly humanities-centered school background, and I am tempted very much to enrol in a university scientific course.
I read in the Italian language Wikipedia that the variance \text{Var}_{H_0}(R) of the statistic R used in the Wald-Wolfowitz test, under the null hypothesis that the X_1,...,X_n are independent, is(4N-6)p(1-p)-(12N-20)p^2(1-p)^2. It is worth to notice that, when discussing the test, it is common to give what I think to be approximations used in the case of the Gaussian approximation of R, rather than the real expectation and variance...
That statistic, as my book (S.M. Ross, Introduction to Probability and Statistics for Engineers and Scientists) explains, and as I find in the German language Wikipedia too, has the probability mass functionP_{H_0}(R=2k)=2\frac{\binom{N^+ -1}{k-1}\binom{N^- -1}{k-1}}{\binom{N^+ +N^-}{n}}P_{H_0}(R=2k+1)=\frac{\binom{N^+ -1}{k-1}\binom{N^- -1}{k}+\binom{N^+ -1}{k}\binom{N^- -1}{k-1}}{\binom{N^+ +N^-}{n}}
The Italian language Wikipedia considers the statistic R to be the same, under the null hypothesis of independence, as 1+\sum_{i=1}^{N-1}|X_i-X_{i+1}| where the X_i are Bernoulli random variables with expectation p, and the expectation E_{H_0}[R] given in that Wikipedia is the same talked about by a user here, who gives a short proof of the value of E_{H_0}[R]=1+2(N-1)p(1-p).

As to variance, I have tried a lot but my efforts to prove it by myself have been useless. I have tried to calculate the second moment by manipulating the sums \sum_{k=1}^{\min\{N^+ ,N^-\}}(4k^2 P_{H_0}(R=2k)+(2k+1)^2P_{H_0}(R=2k+1)) if N^+ \ne N^- and by similarly treating the case N^+ =N^- where I would say that the second moment is E_{H_0}[R^2]=\sum_{k=1}^{N^+ -1}(4k^2 P_{H_0}(R=2k)+(2k+1)^2 P_{H_0}(R=2k+1))+\frac{2(N^+)^2}{\binom{2N^+}{N^+}}, but I haven't been to simplify those sums with their factorials.
Does anybody knows or can link a proof of the formula for the variance \text{Var}_{H_0}(R)?
I \infty-ly thank you all!
 
Last edited:
Physics news on Phys.org
DavideGenoa said:
The Italian language Wikipedia considers the statistic R to be the same, under the null hypothesis of independence, as 1+\sum_{i=1}^{N-1}|X_i-X_{i+1}| where the X_i are Bernoulli random variables with expectation p


One thought is to let Y_i = |X_i - X_{i+1}|.

Only consecutive Y's are not independent, so

Var( 1 + \sum_{i=1}^{N-1} Y_i ) = \sum_{i=1}^{N-1} Var(Y_i) + 2 \sum_{i=1}^{N-2} Cov(Y_i,Y_{i+1})

Then you have to find formula for Var(Y_i) and Cov(Y_i,Y_{i+1}). That might not be easy, but iat least it focuses our attention on only three of the X's at a time.
 

Similar threads

  • · Replies 43 ·
2
Replies
43
Views
6K
Replies
1
Views
4K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 42 ·
2
Replies
42
Views
6K
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K