# Multivariate hypothesis testing

How is hypothesis testing performed for multivariate data?

Say for simplicity we have two iid draws from a binomial distribution Bin(10,q) with X1=7, X2=8. Under the null hypothesis H0:q=1/2, the individual p-values (as one-tail probabilities) are approximately 0.172 and 0.055 respectively so neither data point is sufficient evidence on its own to reject the null at the 95% confidence level. What would be the p-value for the pair (7,8) ?

EnumaElish
Homework Helper
One way to interpret your question is, "what is the sampling distribution generated by n=2, q=0.5?" as in http://faculty.vassar.edu/lowry/binomial.html

OTOH for a joint test of two variables you need to know their joint distribution. In the iid case that's F(x,y)=F(x)F(y).

Last edited:
One way to interpret your question is, "what is the sampling distribution generated by n=2, q=0.5?" as in http://faculty.vassar.edu/lowry/binomial.html

Thanks though I don't quite understand how you mean to apply this to hypothesis testing.

OTOH for a joint test of two variables you need to know their joint distribution. In the iid case that's F(x,y)=F(x)F(y).

The joint distribution on its own isn't really appropriate because F(x1,...,xn) would be O(1/2^n). For independent rv's I guess the Kolmogorov-Smirnov distance would be useful as for a sample of size 1 it resembles a two-tail test. For non-independent samples I'm still not sure what is suitable.

EnumaElish
Homework Helper
Do you care to explain your statement below?
The joint distribution on its own isn't really appropriate because F(x1,...,xn) would be O(1/2^n).

Do you care to explain your statement below?

Say the variables are independent, as a rough approximation you could say the values are clustered about the median so F(x1,...,xn) ~ (1/2)^n. So the cdf on its own isn't really sufficient to use as a p-value, but I guess the multivariate generalization of the KS statistic could be used - though to calculate the critical values would be quite difficult and probably require Monte-Carlo simulation.

As an example, since the multivariate normal cdf has no closed form, what would be a procedure to test a sample, say the distribution Xi ~ N(0,1) with E[XiXj]=r for i<>j, 1<=i,j<=N when N is large?