Variance of statistic used in runs test

In summary: If you can find the asymptotic mean and variance of R , then you can tell whether R is approximately normal.In summary, the conversation discusses the statistic R used in the Wald-Wolfowitz test, its variance under the null hypothesis of independence, and the efforts to prove its formula. The Italian language Wikipedia considers R to be equivalent to 1+\sum_{i=1}^{N-1}|X_i-X_{i+1}| where the X_i are Bernoulli random variables with expectation p. The conversation also mentions using Y_i = |X_i - X_{i+1}| and finding the asymptotic mean and variance of R to determine if it is approximately normal.
  • #1
DavideGenoa
155
5
Hi, friends! Since this is my first post, I want to present myself as an Italian who is trying to teach himself mathematics and natural sciences, while having a strictly humanities-centered school background, and I am tempted very much to enrol in a university scientific course.
I read in the Italian language Wikipedia that the variance [itex]\text{Var}_{H_0}(R)[/itex] of the statistic [itex]R[/itex] used in the Wald-Wolfowitz test, under the null hypothesis that the [itex]X_1,...,X_n[/itex] are independent, is[tex](4N-6)p(1-p)-(12N-20)p^2(1-p)^2.[/tex] It is worth to notice that, when discussing the test, it is common to give what I think to be approximations used in the case of the Gaussian approximation of [itex]R[/itex], rather than the real expectation and variance...
That statistic, as my book (S.M. Ross, Introduction to Probability and Statistics for Engineers and Scientists) explains, and as I find in the German language Wikipedia too, has the probability mass function[tex]P_{H_0}(R=2k)=2\frac{\binom{N^+ -1}{k-1}\binom{N^- -1}{k-1}}{\binom{N^+ +N^-}{n}}[/tex][tex]P_{H_0}(R=2k+1)=\frac{\binom{N^+ -1}{k-1}\binom{N^- -1}{k}+\binom{N^+ -1}{k}\binom{N^- -1}{k-1}}{\binom{N^+ +N^-}{n}}[/tex]
The Italian language Wikipedia considers the statistic [itex]R[/itex] to be the same, under the null hypothesis of independence, as [itex]1+\sum_{i=1}^{N-1}|X_i-X_{i+1}|[/itex] where the [itex]X_i[/itex] are Bernoulli random variables with expectation [itex]p[/itex], and the expectation [itex]E_{H_0}[R][/itex] given in that Wikipedia is the same talked about by a user here, who gives a short proof of the value of [itex]E_{H_0}[R]=1+2(N-1)p(1-p)[/itex].

As to variance, I have tried a lot but my efforts to prove it by myself have been useless. I have tried to calculate the second moment by manipulating the sums [itex]\sum_{k=1}^{\min\{N^+ ,N^-\}}(4k^2 P_{H_0}(R=2k)+(2k+1)^2P_{H_0}(R=2k+1))[/itex] if [itex]N^+ \ne N^-[/itex] and by similarly treating the case [itex]N^+ =N^-[/itex] where I would say that the second moment is [itex]E_{H_0}[R^2]=\sum_{k=1}^{N^+ -1}(4k^2 P_{H_0}(R=2k)+(2k+1)^2 P_{H_0}(R=2k+1))+\frac{2(N^+)^2}{\binom{2N^+}{N^+}}[/itex], but I haven't been to simplify those sums with their factorials.
Does anybody knows or can link a proof of the formula for the variance [itex]\text{Var}_{H_0}(R)[/itex]?
I [itex]\infty[/itex]-ly thank you all!
 
Last edited:
Physics news on Phys.org
  • #2
DavideGenoa said:
The Italian language Wikipedia considers the statistic [itex]R[/itex] to be the same, under the null hypothesis of independence, as [itex]1+\sum_{i=1}^{N-1}|X_i-X_{i+1}|[/itex] where the [itex]X_i[/itex] are Bernoulli random variables with expectation [itex]p[/itex]


One thought is to let [itex] Y_i = |X_i - X_{i+1}| [/itex].

Only consecutive [itex]Y's[/itex] are not independent, so

[tex] Var( 1 + \sum_{i=1}^{N-1} Y_i ) = \sum_{i=1}^{N-1} Var(Y_i) + 2 \sum_{i=1}^{N-2} Cov(Y_i,Y_{i+1}) [/tex]

Then you have to find formula for [itex] Var(Y_i) [/itex] and [itex] Cov(Y_i,Y_{i+1}) [/itex]. That might not be easy, but iat least it focuses our attention on only three of the [itex] X's [/itex] at a time.
 

What is the variance of a statistic used in runs test?

The variance of a statistic used in runs test refers to the measure of how much the values of a particular statistic vary from the average value. It is a measure of the spread or dispersion of the values around the mean.

Why is the variance of a statistic used in runs test important?

The variance of a statistic used in runs test is important because it helps to evaluate the reliability and accuracy of the statistic. A low variance indicates that the values are closely clustered around the mean, while a high variance suggests that the values are more spread out.

How is the variance of a statistic used in runs test calculated?

The variance of a statistic used in runs test is calculated by taking the sum of the squared differences between each value and the mean, and then dividing by the total number of values. This result is then squared to get the final variance value.

What does a high variance of a statistic used in runs test indicate?

A high variance of a statistic used in runs test indicates that the values are widely spread out from the mean, which suggests that there is a high degree of variability in the data. This could be due to outliers or a lack of consistency in the data.

How can the variance of a statistic used in runs test be reduced?

The variance of a statistic used in runs test can be reduced by either removing outliers or by increasing the sample size. Removing outliers can help to eliminate extreme values that may be causing the high variance. Increasing the sample size can help to reduce the impact of outliers and provide a more representative set of data for the calculation of the variance.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
921
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
834
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
770
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
860
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
855
Back
Top