Multivariate hypothesis testing

Click For Summary

Discussion Overview

The discussion focuses on the methods and challenges of performing hypothesis testing for multivariate data, particularly in the context of binomial distributions and joint distributions. Participants explore the implications of independence and the appropriate statistical techniques for testing hypotheses involving multiple variables.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant inquires about the p-value for a pair of binomial draws (7,8) under the null hypothesis q=1/2, noting that individual p-values do not provide sufficient evidence to reject the null hypothesis.
  • Another participant suggests that understanding the sampling distribution for n=2 and q=0.5 is essential for hypothesis testing, referencing a specific source for further clarification.
  • There is a discussion about the necessity of knowing the joint distribution for a joint test of two variables, with a mention of the relationship F(x,y)=F(x)F(y) for independent identically distributed (iid) variables.
  • A participant questions the appropriateness of using the joint distribution alone, arguing that it would lead to a probability of O(1/2^n) for independent random variables, and suggests that the Kolmogorov-Smirnov distance might be useful for independent samples.
  • Another participant elaborates on the limitations of using the cumulative distribution function (cdf) alone for p-value calculations, proposing that the multivariate generalization of the Kolmogorov-Smirnov statistic could be applicable, though it would require complex calculations and possibly Monte-Carlo simulations.
  • There is a query regarding the procedure for testing a sample from a multivariate normal distribution when the correlation between variables is considered, particularly in large sample sizes.

Areas of Agreement / Disagreement

Participants express differing views on the appropriate methods for hypothesis testing in multivariate contexts, particularly regarding the use of joint distributions and the implications of independence. The discussion remains unresolved with multiple competing perspectives presented.

Contextual Notes

Participants highlight limitations related to the assumptions of independence, the complexity of calculating critical values for multivariate tests, and the challenges posed by non-independent samples. There is also an acknowledgment of the lack of closed forms for certain multivariate distributions.

bpet
Messages
531
Reaction score
7
How is hypothesis testing performed for multivariate data?

Say for simplicity we have two iid draws from a binomial distribution Bin(10,q) with X1=7, X2=8. Under the null hypothesis H0:q=1/2, the individual p-values (as one-tail probabilities) are approximately 0.172 and 0.055 respectively so neither data point is sufficient evidence on its own to reject the null at the 95% confidence level. What would be the p-value for the pair (7,8) ?
 
Physics news on Phys.org
One way to interpret your question is, "what is the sampling distribution generated by n=2, q=0.5?" as in http://faculty.vassar.edu/lowry/binomial.html

OTOH for a joint test of two variables you need to know their joint distribution. In the iid case that's F(x,y)=F(x)F(y).
 
Last edited:
EnumaElish said:
One way to interpret your question is, "what is the sampling distribution generated by n=2, q=0.5?" as in http://faculty.vassar.edu/lowry/binomial.html

Thanks though I don't quite understand how you mean to apply this to hypothesis testing.

OTOH for a joint test of two variables you need to know their joint distribution. In the iid case that's F(x,y)=F(x)F(y).

The joint distribution on its own isn't really appropriate because F(x1,...,xn) would be O(1/2^n). For independent rv's I guess the Kolmogorov-Smirnov distance would be useful as for a sample of size 1 it resembles a two-tail test. For non-independent samples I'm still not sure what is suitable.
 
Do you care to explain your statement below?
bpet said:
The joint distribution on its own isn't really appropriate because F(x1,...,xn) would be O(1/2^n).
 
EnumaElish said:
Do you care to explain your statement below?

Say the variables are independent, as a rough approximation you could say the values are clustered about the median so F(x1,...,xn) ~ (1/2)^n. So the cdf on its own isn't really sufficient to use as a p-value, but I guess the multivariate generalization of the KS statistic could be used - though to calculate the critical values would be quite difficult and probably require Monte-Carlo simulation.

As an example, since the multivariate normal cdf has no closed form, what would be a procedure to test a sample, say the distribution Xi ~ N(0,1) with E[XiXj]=r for i<>j, 1<=i,j<=N when N is large?
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 20 ·
Replies
20
Views
4K
Replies
3
Views
12K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 21 ·
Replies
21
Views
4K
Replies
2
Views
3K
Replies
26
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K