# Homework Help: Statistics - Joint PMF / Hypothesis Testing

1. Apr 13, 2013

### brojesus111

1. The problem statement, all variables and given/known data

We sample a population 50 times with replacement, with all individual sampled equally likely. We survey the gender and quality of life. The counts are: Male: 13 high quality, 11 low | female: 18 high, 8 low.

Let P_M, P_F, P_H, P_L to denote these population proportions for each of the categories and let population proportions for the combinations be denoted by P_mh, P_ml, P_fh, P_fl. Paramaters are denoted by θ=(P_mh, P_ml, P_fh, P_fl).

Null hypothesis: P_mh = P_m * P_h, P_ml = P_m * P_l... same for females
Alternative hypothesis: P_mh =/= P_m * P_h, P_ml =/= P_m * P_l... same for females

1. Data counts are denoted by N_mh, N_ml, N_fh, N_fl. What is the joint probability mass function for these counts?
2. Write down the sets w_0 and w_A so that null hypothesis is θ∈w_0 and alternative hypothesis is θ∈w_A.

2. Relevant equations

3. The attempt at a solution

1. So I know understand the problem wants us to find P[(N_mh, N_ml, N_fh, N_fl)=(n_mh, n_ml, n_fh, n_fl)]. But I'm a bit confused on the notation and how to answer this problem.

2. Once again I'm not sure I understand exactly how to approach this problem.

Any help is greatly appreciated.

2. Apr 14, 2013

### haruspex

The population falls into 4 groups, g1 to g4, say. According to the null hyp they have frequencies P_m * P_h, P_m * P_l etc. Call these p1 to p4. Suppose you take N samples from the population, and you get, in order, 2 1 2 4 4 3 2 1 4 ..., getting, in all, ni from group gi. What would the probability of that specific sequence be? How many sequences give the same total for each group?

3. Apr 14, 2013

### brojesus111

So it's a multinomial distribution.

Any hints or tips on the second question?

4. Apr 14, 2013

### haruspex

I think the key point about the second question is that θ is a vector of four unknowns, but:
- they must add up to 1
- the null hypothesis does not prescribe them all, only a relationship between them.
So I think it's looking for a vector, involving some free parameter, which generically describes elements of w_0.

5. Apr 14, 2013

### brojesus111

So I think w_0 is just a point and w_A is a hyperplane in 4 dimensions that is missing the point that is in w_0. My problem is how would I write that in the context of this problem?

Last edited: Apr 14, 2013
6. Apr 15, 2013

### haruspex

No, that's exactly what I'm saying it isn't. The null hypothesis makes no claim about the gender ratio in the population. Therefore there is a continuum of values of the 4 vector which satisfy it. The answer will be something like {(f1(t), f2(t), f3(t), f4(t))} where fi(t) are functions (you need to determine) of some unknown parameter t. Or maybe there are two free parameters.
w_A will be everywhere in the 4-space except for that continuum.