Choosing an appropriate hypothesis

Agent Smith · Dec 10, 2024

The National Center for Education Statistics monitors many aspects of elementary and secondary education nationwide. Their 1996 numbers are often used as a baseline to assess changes. In 1996, 31% of students reported that their mothers had graduated from college. In 2000, responses from 8368 students found that this figure had grown to 32%. Set up null and alternative hypotheses about the population proportion P. Write appropriate hypotheses.

Options:

a) Ho: P is equal to 0.31. Ha: P is less than 0.31.	b) Ho: P is equal to 0.31. Ha: P is greater than 0.20.	c) Ho: P is less than 0.31. Ha: P is equal to 0.31.
d) Ho: P is greater than 0.31. Ha: P is equal to 0.31.	e) Ho: P is equal to 0.31. Ha: P is not equal to 0.31.	f) Ho: P is not equal to 0.31. Ha: P is equal to 0.31.
g) Ho: P is equal to 0.32. Ha: P is not equal to 0.32.	h) Ho: P is equal to 0.32. Ha: P is greater than 0.32.	i) Ho: P is equal to 0.32. Ha: P is less than 0.32.

I think option e) in blue above is the right choice, but I'm not sure. I would've preferred Ho: P = 0.31 and Ha: P > 0.31 (the question states that "...this figure had grown to 32%", but that is not among the options.

Which is the correct answer? Muchas gracias.

Dale · Dec 10, 2024

I agree with you

Agent Smith · Dec 11, 2024

@Dale but what about the 32% we found? It doesn't appear in the hypotheses?

Why is there this

Dale · Dec 11, 2024

Usually, the 0.32 will be used to generate a test statistic, not a hypothesis.

FactChecker · Dec 11, 2024

A couple of thoughts:
1) It is very questionable to base a statistical hypothesis on the results of an already collected sample that will be used in the test statistic. With that in mind, (e) is the one that would have been picked before the 2000 sample was taken.
2) We must assume that there is no (significant) overlap between the 1996 and 2000 samples. For people who are in both samples, more mothers can graduate but nobody "un-graduates". A significant overlap in the samples would bias the results of certain tests. It is very understandable that there might be overlap if the same sampling process was used. In fact, there might be complete overlap if the sampling was intentionally done to determine how many mothers graduated between 1996 and 2000.

Hornbein · Dec 11, 2024

I don't know what the study is intended to show. All I can do is try to read the mind of the questioner. I'm glad I'm no longer a student. On the other hand, being able to infer from hints what the boss wants to hear is an important career skill, so maybe that's the true goal. So let's go.

The boss wants to know whether the percentage is increasing. Then Ho: P = 0.31 and Ha: P > 0.31 is what you want.

The boss wants to know whether the percentage is decreasing. Then Ho: P = 0.31 and Ha: P < 0.31 is what you want.

The boss wants to conclude that there is no significant change in the percentage. That's probably what they want. The change is so small that I know that this is the case even without doing the math. Collecting the data then asking the question is bad practice but that is more or less what we are doing here. Gross. Then e) is the answer.

g) is also reasonable for that purpose. The implication seems to be that the 31% figure is more firmly established, with either no or less sampling error, so is a better base line. I do know intuitively that g) isn't the answer They want.

All and all, I wouldn't worry about it. It's just a weird question. What if the figures were 31% and 61%? Well, we can't expect question askers to be perfect. Concentrate on other things.

Agent Smith · Dec 12, 2024

Thank you to @Dale and @FactChecker and @Hornbein for their comments. A quick question.

Say I know the proportion of people who are vegetarian in my school (##p_0##). I suspect that this proportion has increased. I want to test this hypothesis. How should I go about doing that?

1. I formulate a hypothesis: ##p > p_0##
2. I take an adequate sample (##n##), make sure the conditions for inference are satisfied, then compute the proportion in the sample (##p_s##).
3. Then I test the hypothesis: ##\frac{p_s - p_0}{\frac{\sigma_0}{\sqrt n}}##

Correct?

FactChecker · Dec 12, 2024

Agent Smith said:

3. Then I test the hypothesis: ##\frac{p_s - p_0}{\frac{\sigma_0}{\sqrt n}}##

Correct?

Comparing proportions does not use the same test statistic as comparing means. See this.

Agent Smith · Dec 12, 2024

@FactChecker I don't think you read my question properly. I want to know if the proportion of a certain category (vegetarianism) has changed (increased) in the population. My lessons say we take a sample of the population in my school and use that (as I outlined in the question) to test the hypothesis.

My hypotheses would be:
##H_0: p = p_0##
##H_0: p > p_0##
My sample is adequate (size n) in all respects (assume) and all conditions for inference are met. Then ...
##z = \frac{p_s - p_0}{\frac{\sigma_0}{\sqrt n}}##
I get a p-value from the z score and then can use that to reject/fail to reject ##H_0##.

The link was awesome. I've saved it for a later read. Gracias, muchas.

FactChecker · Dec 12, 2024

Agent Smith said:

I don't think you read my question properly. I want to know if the proportion of a certain category (vegetarianism) has changed (increased) in the population. My lessons say we take a sample of the population in my school and use that (as I outlined in the question) to test the hypothesis.

Ok. Apparently you are expected to use a certain method.

Agent Smith said:

Then ...
##z = \frac{p_s - p_0}{\frac{\sigma_0}{\sqrt n}}##

I am not an expert in this, but IMO, this does not use the correct standard deviation in the denominator for proportions. But it appears that they are expecting you to use this. I wouldn't get too concerned about it as long as you are learning the basics of the lesson. Just realize that a proportion is different and you can always look it up if you are using it in a critical application.

Agent Smith said:

I get a p-value from the z score and then can use that to reject/fail to reject ##H_0##.

Yes.

Agent Smith · Dec 13, 2024

FactChecker said:

I am not an expert in this, but IMO, this does not use the correct standard deviation in the denominator for proportions.

And the correct standard deviation would be ?

FactChecker · Dec 13, 2024

Agent Smith said:

And the correct standard deviation would be ?

See post #8. It's the denominator of the formula for ##z##.

Agent Smith · Dec 13, 2024

@FactChecker I saw that formula you refer to in a different question. For questions like the one I ask, the formula is the one you say is incorrect.

FactChecker · Dec 13, 2024

Agent Smith said:

@FactChecker I saw that formula you refer to in a different question. For questions like the one I ask, the formula is the one you say is incorrect.

Your post #5 specifies that you are testing proportions. You need to be careful that you are using the correct statistic for proportions. I see that the reference I posted is for both proportions being estimated from the samples. I do not know the correct formula if one of the proportions is known, as this problem states, rather than estimated.

Agent Smith · Dec 13, 2024

The rest of this question suggests that my answer is correct regarding an appropriate hypothesis.
##H_0: p = 0.31## and ##H_a: p \ne 0.31##

Could I have also chosen the alternative hypothesis as ##H_a: p > 0,31##? Yes/no, why? The question states that the proportion has grown from ##0.31## to ##0.32## and that is what isn't being tested. I find that confusing.

Dale · Dec 13, 2024

Yes, you can do one-tailed tests.

Based on your data, ##X=x##, you calculate some test statistic, ##f=F(X=x)##.

For a standard two tailed test you calculate the critical value of the test statistic, ##f_0##, as ##P(f_0<|F(X)| \ | \ H_0)=\alpha## where the confidence is ##1-\alpha##

For a one tailed test you similarly calculate either ##P(f_0>F(X) \ | \ H_0)=\alpha## or ##P(f_0<F(X) \ | \ H_0)=\alpha##

Note that the calculation only depends on ##H_0##, even though the appropriate one tailed test is considered a test of your alternate hypothesis.

FactChecker · Dec 13, 2024

Here is a reference for the correct test statistic to use when comparing a known proportion to a sample proportion. The denominator for ##Z## has an additional factor of ##1-p_0##. That makes sense since the test should have symmetry between ##p_0## and ##1-p_0## (ie, the test for the proportion having the property of interest should give the same result as the test for the proportion not having the property)

(I will leave it at that, since I am not expert in this. I have to defer to the Pen State reference.)

Agent Smith · Dec 13, 2024

FactChecker said:

Here is a reference for the correct test statistic to use when comparing a known proportion to a sample proportion. The denominator for ##Z## has an additional factor of ##1-p_0##. That makes sense since the test should have symmetry between ##p_0## and ##1-p_0## (ie, the test for the proportion having the property of interest should give the same result as the test for the proportion not having the property)

(I will leave it at that, since I am not expert in this. I have to defer to the Pen State reference.)
View attachment 354473

That is correct. It slipped my mind. Sorry for the undue delay.

Agent Smith · Dec 13, 2024

@Dale is there a reason the questioner thought it was better for ##H_a: p \ne 0.31## instead of ##H_a: p > 0.31##. The question is quite clear about the direction "[...]this figure had grown to 32%.[...]"

Dale · Dec 14, 2024

You would have to ask the questioner about that. I can’t know why they made that choice.

FactChecker · Dec 14, 2024

Agent Smith said:

@Dale is there a reason the questioner thought it was better for ##H_a: p \ne 0.31## instead of ##H_a: p > 0.31##. The question is quite clear about the direction "[...]this figure had grown to 32%.[...]"

Be careful about using the test results to design a hypothesis to be checked by those same test results. That makes it easier for the original test results to satisfy the hypothesis. It would be flawed, circular logic.

Suppose the test sample gives a percentage of 32.16489%. Make your hypothesis that the population percentage is 32.16489% and VOILA!, the original test sample fits perfectly! But it has no validity as a test at all.

Before you saw the sample test results were a higher percentage, you probably would have used the hypothesis that the questioner prefers. That is legitimate.

Agent Smith · Dec 14, 2024

This is me doing statistics:
I read a statistics journal. It contains statistical reports and one report says that 97% of Americans possess a cell phone. I suspect that the proportion is much lower in my little town 2000 people. So I conduct a survey, taking a random sample of 400 people and the sample proportion is 85%.
##H_0: p = 0.97##
##H_a: p < 0.97##
Set significance level ##\alpha = 0.05##

I use the formula ##z = \frac{0.85 - 0.97}{\sqrt{\frac{0.97 \times 0.03}{400}}}##. I look up a p-value from a z table. If ##\text{p-value} \leq 0.05## I reject ##H_0##; if not I fail to reject ##H_0##.

Does this sound right?

FactChecker · Dec 14, 2024

That sounds right to me.

Dale · Dec 14, 2024

Agent Smith said:

This is me doing statistics:
I read a statistics journal. It contains statistical reports and one report says that 97% of Americans possess a cell phone. I suspect that the proportion is much lower in my little town 2000 people. So I conduct a survey, taking a random sample of 400 people and the sample proportion is 85%.
##H_0: p = 0.97##
##H_a: p < 0.97##
Set significance level ##\alpha = 0.05##

I use the formula ##z = \frac{0.85 - 0.97}{\sqrt{\frac{0.97 \times 0.03}{400}}}##. I look up a p-value from a z table. If ##\text{p-value} \leq 0.05## I reject ##H_0##; if not I fail to reject ##H_0##.

Does this sound right?

Yes. You could do it that way. However, this is the only time in your life that you will use the normal approximation for a test of proportions. In real life you will let a computer do the number crunching for you so that you don’t need to do the approximation.

Or if you go Bayesian you can just calculate the answer with the ##\beta(340+1,\ 60+1)## conjugate posterior.

Agent Smith · Dec 15, 2024

@Dale and @FactChecker what if my sample proportion ##0\%## or ##100\%##, What then?

FactChecker · Dec 15, 2024

Agent Smith said:

@Dale and @FactChecker what if my sample proportion ##0\%## or ##100\%##, What then?

It's just like anything else. Any finite sample can end up being all of one type.

Dale · Dec 15, 2024

Agent Smith said:

@Dale and @FactChecker what if my sample proportion ##0\%## or ##100\%##, What then?

Then the normal approximation definitely will not work. The Bayesian formula above works fine, and so does Fisher’s exact test.

Agent Smith · Dec 15, 2024

@Dale & @FactChecker

Ok, what if the sample proportion were ##1\%## or ##99\%##, what then? How would I go about testing my hypothesis?

FactChecker · Dec 15, 2024

Dale said:

Then the normal approximation definitely will not work. The Bayesian formula above works fine, and so does Fisher’s exact test.

if the sample size was large enough for the selected confidence level, such an extreme sample gives a very clear statistical result.

Agent Smith · Dec 15, 2024

So ##2## scenarios:

1) Population proportion = ##0.97## and sample proportion = ##0.01##
My hypotheses:
##H_0: p = 0.97##
##H_a: p < 0.97##

2) Population proportion = ##0.97## and sample proportion = ##0.99##
##H_0: p = 0.97##
##H_0: p < 0.97##

right?

Choosing an appropriate hypothesis

Graduate Expected numbers of cards of a last color remaining

Graduate Probability puzzle

Undergrad The problem of points

Undergrad The countability paradox of computable numbers

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Choosing an appropriate hypothesis

Similar threads