Multivariate hypothesis testing

  • Thread starter bpet
  • Start date
  • #1
525
5

Main Question or Discussion Point

How is hypothesis testing performed for multivariate data?

Say for simplicity we have two iid draws from a binomial distribution Bin(10,q) with X1=7, X2=8. Under the null hypothesis H0:q=1/2, the individual p-values (as one-tail probabilities) are approximately 0.172 and 0.055 respectively so neither data point is sufficient evidence on its own to reject the null at the 95% confidence level. What would be the p-value for the pair (7,8) ?
 

Answers and Replies

  • #2
EnumaElish
Science Advisor
Homework Helper
2,304
124
One way to interpret your question is, "what is the sampling distribution generated by n=2, q=0.5?" as in http://faculty.vassar.edu/lowry/binomial.html

OTOH for a joint test of two variables you need to know their joint distribution. In the iid case that's F(x,y)=F(x)F(y).
 
Last edited:
  • #3
525
5
One way to interpret your question is, "what is the sampling distribution generated by n=2, q=0.5?" as in http://faculty.vassar.edu/lowry/binomial.html
Thanks though I don't quite understand how you mean to apply this to hypothesis testing.

OTOH for a joint test of two variables you need to know their joint distribution. In the iid case that's F(x,y)=F(x)F(y).
The joint distribution on its own isn't really appropriate because F(x1,...,xn) would be O(1/2^n). For independent rv's I guess the Kolmogorov-Smirnov distance would be useful as for a sample of size 1 it resembles a two-tail test. For non-independent samples I'm still not sure what is suitable.
 
  • #4
EnumaElish
Science Advisor
Homework Helper
2,304
124
Do you care to explain your statement below?
The joint distribution on its own isn't really appropriate because F(x1,...,xn) would be O(1/2^n).
 
  • #5
525
5
Do you care to explain your statement below?
Say the variables are independent, as a rough approximation you could say the values are clustered about the median so F(x1,...,xn) ~ (1/2)^n. So the cdf on its own isn't really sufficient to use as a p-value, but I guess the multivariate generalization of the KS statistic could be used - though to calculate the critical values would be quite difficult and probably require Monte-Carlo simulation.

As an example, since the multivariate normal cdf has no closed form, what would be a procedure to test a sample, say the distribution Xi ~ N(0,1) with E[XiXj]=r for i<>j, 1<=i,j<=N when N is large?
 

Related Threads on Multivariate hypothesis testing

  • Last Post
Replies
5
Views
2K
  • Last Post
Replies
3
Views
372
  • Last Post
Replies
3
Views
3K
  • Last Post
Replies
3
Views
5K
  • Last Post
Replies
1
Views
3K
  • Last Post
Replies
2
Views
3K
  • Last Post
Replies
1
Views
2K
  • Last Post
Replies
0
Views
1K
  • Last Post
Replies
3
Views
2K
Replies
1
Views
9K
Top