Multivariate hypothesis testing

bpet · Feb 1, 2011

How is hypothesis testing performed for multivariate data?

Say for simplicity we have two iid draws from a binomial distribution Bin(10,q) with X1=7, X2=8. Under the null hypothesis H0:q=1/2, the individual p-values (as one-tail probabilities) are approximately 0.172 and 0.055 respectively so neither data point is sufficient evidence on its own to reject the null at the 95% confidence level. What would be the p-value for the pair (7,8) ?

EnumaElish · Feb 1, 2011

One way to interpret your question is, "what is the sampling distribution generated by n=2, q=0.5?" as in http://faculty.vassar.edu/lowry/binomial.html

OTOH for a joint test of two variables you need to know their joint distribution. In the iid case that's F(x,y)=F(x)F(y).

bpet · Feb 6, 2011

EnumaElish said:

One way to interpret your question is, "what is the sampling distribution generated by n=2, q=0.5?" as in http://faculty.vassar.edu/lowry/binomial.html

Thanks though I don't quite understand how you mean to apply this to hypothesis testing.

OTOH for a joint test of two variables you need to know their joint distribution. In the iid case that's F(x,y)=F(x)F(y).

The joint distribution on its own isn't really appropriate because F(x1,...,xn) would be O(1/2^n). For independent rv's I guess the Kolmogorov-Smirnov distance would be useful as for a sample of size 1 it resembles a two-tail test. For non-independent samples I'm still not sure what is suitable.

EnumaElish · Feb 10, 2011

Do you care to explain your statement below?

bpet said:

The joint distribution on its own isn't really appropriate because F(x1,...,xn) would be O(1/2^n).

bpet · Feb 17, 2011

EnumaElish said:

Do you care to explain your statement below?

Say the variables are independent, as a rough approximation you could say the values are clustered about the median so F(x1,...,xn) ~ (1/2)^n. So the cdf on its own isn't really sufficient to use as a p-value, but I guess the multivariate generalization of the KS statistic could be used - though to calculate the critical values would be quite difficult and probably require Monte-Carlo simulation.

As an example, since the multivariate normal cdf has no closed form, what would be a procedure to test a sample, say the distribution Xi ~ N(0,1) with E[XiXj]=r for i<>j, 1<=i,j<=N when N is large?

Multivariate hypothesis testing

Thread 'Please Explain (actually explain) The Monty Hall Problem'

Thread 'Value of intuitionistic logic'

Thread 'Formal derivation of statement from Peano Arithmetic system'

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

I Help me understand skewness in QQ-plots please

I What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

A Distribution of Range of Samples taken from N(0,1)

Recent Insights

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers