Choosing an appropriate hypothesis

  • B
  • Thread starter Agent Smith
  • Start date
  • Tags
    population
  • Featured
  • #1
Agent Smith
345
36
TL;DR Summary
Choose the correct hypotheses
The National Center for Education Statistics monitors many aspects of elementary and secondary education nationwide. Their 1996 numbers are often used as a baseline to assess changes. In 1996, 31% of students reported that their mothers had graduated from college. In 2000, responses from 8368 students found that this figure had grown to 32%. Set up null and alternative hypotheses about the population proportion P. Write appropriate hypotheses.

Options:
a) Ho: P is equal to 0.31.
Ha: P is less than 0.31.
b) Ho: P is equal to 0.31.
Ha: P is greater than 0.20.
c) Ho: P is less than 0.31.
Ha: P is equal to 0.31.
d) Ho: P is greater than 0.31.
Ha: P is equal to 0.31.
e) Ho: P is equal to 0.31.
Ha: P is not equal to 0.31.
f) Ho: P is not equal to 0.31.
Ha: P is equal to 0.31.
g) Ho: P is equal to 0.32.
Ha: P is not equal to 0.32.
h) Ho: P is equal to 0.32.
Ha: P is greater than 0.32.
i) Ho: P is equal to 0.32.
Ha: P is less than 0.32.

I think option e) in blue above is the right choice, but I'm not sure. I would've preferred Ho: P = 0.31 and Ha: P > 0.31 (the question states that "...this figure had grown to 32%", but that is not among the options.

Which is the correct answer? Muchas gracias.
 
Physics news on Phys.org
  • #3
@Dale but what about the 32% we found? It doesn't appear in the hypotheses? 🤔 Why is there this 👇

Capture.PNG
 
  • #4
Usually, the 0.32 will be used to generate a test statistic, not a hypothesis.
 
  • Like
Likes Agent Smith
  • #5
A couple of thoughts:
1) It is very questionable to base a statistical hypothesis on the results of an already collected sample that will be used in the test statistic. With that in mind, (e) is the one that would have been picked before the 2000 sample was taken.
2) We must assume that there is no (significant) overlap between the 1996 and 2000 samples. For people who are in both samples, more mothers can graduate but nobody "un-graduates". A significant overlap in the samples would bias the results of certain tests. It is very understandable that there might be overlap if the same sampling process was used. In fact, there might be complete overlap if the sampling was intentionally done to determine how many mothers graduated between 1996 and 2000.
 
  • Like
Likes Hornbein
  • #6
I don't know what the study is intended to show. All I can do is try to read the mind of the questioner. I'm glad I'm no longer a student. On the other hand, being able to infer from hints what the boss wants to hear is an important career skill, so maybe that's the true goal. So let's go.

The boss wants to know whether the percentage is increasing. Then Ho: P = 0.31 and Ha: P > 0.31 is what you want.

The boss wants to know whether the percentage is decreasing. Then Ho: P = 0.31 and Ha: P < 0.31 is what you want.

The boss wants to conclude that there is no significant change in the percentage. That's probably what they want. The change is so small that I know that this is the case even without doing the math. Collecting the data then asking the question is bad practice but that is more or less what we are doing here. Gross. Then e) is the answer.

g) is also reasonable for that purpose. The implication seems to be that the 31% figure is more firmly established, with either no or less sampling error, so is a better base line. I do know intuitively that g) isn't the answer They want.

All and all, I wouldn't worry about it. It's just a weird question. What if the figures were 31% and 61%? Well, we can't expect question askers to be perfect. Concentrate on other things.
 
  • #7
Thank you to @Dale and @FactChecker and @Hornbein for their comments. A quick question.

Say I know the proportion of people who are vegetarian in my school (##p_0##). I suspect that this proportion has increased. I want to test this hypothesis. How should I go about doing that?

1. I formulate a hypothesis: ##p > p_0##
2. I take an adequate sample (##n##), make sure the conditions for inference are satisfied, then compute the proportion in the sample (##p_s##).
3. Then I test the hypothesis: ##\frac{p_s - p_0}{\frac{\sigma_0}{\sqrt n}}##

Correct?
 
  • #8
Agent Smith said:
3. Then I test the hypothesis: ##\frac{p_s - p_0}{\frac{\sigma_0}{\sqrt n}}##

Correct?
Comparing proportions does not use the same test statistic as comparing means. See this.
1734024125400.png
 
  • #9
@FactChecker I don't think you read my question properly. I want to know if the proportion of a certain category (vegetarianism) has changed (increased) in the population. My lessons say we take a sample of the population in my school and use that (as I outlined in the question) to test the hypothesis.

My hypotheses would be:
##H_0: p = p_0##
##H_0: p > p_0##
My sample is adequate (size n) in all respects (assume) and all conditions for inference are met. Then ...
##z = \frac{p_s - p_0}{\frac{\sigma_0}{\sqrt n}}##
I get a p-value from the z score and then can use that to reject/fail to reject ##H_0##.

The link was awesome. I've saved it for a later read. Gracias, muchas.
 
  • #10
Agent Smith said:
I don't think you read my question properly. I want to know if the proportion of a certain category (vegetarianism) has changed (increased) in the population. My lessons say we take a sample of the population in my school and use that (as I outlined in the question) to test the hypothesis.
Ok. Apparently you are expected to use a certain method.
Agent Smith said:
Then ...
##z = \frac{p_s - p_0}{\frac{\sigma_0}{\sqrt n}}##
I am not an expert in this, but IMO, this does not use the correct standard deviation in the denominator for proportions. But it appears that they are expecting you to use this. I wouldn't get too concerned about it as long as you are learning the basics of the lesson. Just realize that a proportion is different and you can always look it up if you are using it in a critical application.
Agent Smith said:
I get a p-value from the z score and then can use that to reject/fail to reject ##H_0##.
Yes.
 
  • #11
FactChecker said:
I am not an expert in this, but IMO, this does not use the correct standard deviation in the denominator for proportions.
And the correct standard deviation would be ? 🤔
 
  • #12
Agent Smith said:
And the correct standard deviation would be ? 🤔
See post #8. It's the denominator of the formula for ##z##.
 
  • #13
@FactChecker I saw that formula you refer to in a different question. For questions like the one I ask, the formula is the one you say is incorrect. 🤔
 
  • #14
Agent Smith said:
@FactChecker I saw that formula you refer to in a different question. For questions like the one I ask, the formula is the one you say is incorrect. 🤔
Your post #5 specifies that you are testing proportions. You need to be careful that you are using the correct statistic for proportions. I see that the reference I posted is for both proportions being estimated from the samples. I do not know the correct formula if one of the proportions is known, as this problem states, rather than estimated.
 
  • Care
Likes Agent Smith
  • #15
The rest of this question suggests that my answer is correct regarding an appropriate hypothesis.
##H_0: p = 0.31## and ##H_a: p \ne 0.31##

Could I have also chosen the alternative hypothesis as ##H_a: p > 0,31##? Yes/no, why? The question states that the proportion has grown from ##0.31## to ##0.32## and that is what isn't being tested. I find that confusing.
 
  • #16
Yes, you can do one-tailed tests.

Based on your data, ##X=x##, you calculate some test statistic, ##f=F(X=x)##.

For a standard two tailed test you calculate the critical value of the test statistic, ##f_0##, as ##P(f_0<|F(X)| \ | \ H_0)=\alpha## where the confidence is ##1-\alpha##

For a one tailed test you similarly calculate either ##P(f_0>F(X) \ | \ H_0)=\alpha## or ##P(f_0<F(X) \ | \ H_0)=\alpha##

Note that the calculation only depends on ##H_0##, even though the appropriate one tailed test is considered a test of your alternate hypothesis.
 
Last edited:
  • #17
Here is a reference for the correct test statistic to use when comparing a known proportion to a sample proportion. The denominator for ##Z## has an additional factor of ##1-p_0##. That makes sense since the test should have symmetry between ##p_0## and ##1-p_0## (ie, the test for the proportion having the property of interest should give the same result as the test for the proportion not having the property)

(I will leave it at that, since I am not expert in this. I have to defer to the Pen State reference.)
1734149791584.png
 
Last edited:
  • Like
Likes Agent Smith
  • #18
FactChecker said:
Here is a reference for the correct test statistic to use when comparing a known proportion to a sample proportion. The denominator for ##Z## has an additional factor of ##1-p_0##. That makes sense since the test should have symmetry between ##p_0## and ##1-p_0## (ie, the test for the proportion having the property of interest should give the same result as the test for the proportion not having the property)

(I will leave it at that, since I am not expert in this. I have to defer to the Pen State reference.)
View attachment 354473
That is correct. It slipped my mind. Sorry for the undue delay.
 
  • Like
Likes FactChecker
  • #19
@Dale is there a reason the questioner thought it was better for ##H_a: p \ne 0.31## instead of ##H_a: p > 0.31##. The question is quite clear about the direction "[...]this figure had grown to 32%.[...]"
 
  • #20
You would have to ask the questioner about that. I can’t know why they made that choice.
 
  • Haha
Likes Agent Smith
  • #21
Agent Smith said:
@Dale is there a reason the questioner thought it was better for ##H_a: p \ne 0.31## instead of ##H_a: p > 0.31##. The question is quite clear about the direction "[...]this figure had grown to 32%.[...]"
Be careful about using the test results to design a hypothesis to be checked by those same test results. That makes it easier for the original test results to satisfy the hypothesis. It would be flawed, circular logic.

Suppose the test sample gives a percentage of 32.16489%. Make your hypothesis that the population percentage is 32.16489% and VOILA!, the original test sample fits perfectly! But it has no validity as a test at all.

Before you saw the sample test results were a higher percentage, you probably would have used the hypothesis that the questioner prefers. That is legitimate.
 
Last edited:
  • Like
Likes Agent Smith
  • #22
This is me doing statistics:
I read a statistics journal. It contains statistical reports and one report says that 97% of Americans possess a cell phone. I suspect that the proportion is much lower in my little town 2000 people. So I conduct a survey, taking a random sample of 400 people and the sample proportion is 85%.
##H_0: p = 0.97##
##H_a: p < 0.97##
Set significance level ##\alpha = 0.05##

I use the formula ##z = \frac{0.85 - 0.97}{\sqrt{\frac{0.97 \times 0.03}{400}}}##. I look up a p-value from a z table. If ##\text{p-value} \leq 0.05## I reject ##H_0##; if not I fail to reject ##H_0##.

Does this sound right?
 
  • #23
That sounds right to me.
 
  • Like
Likes Agent Smith
  • #24
Agent Smith said:
This is me doing statistics:
I read a statistics journal. It contains statistical reports and one report says that 97% of Americans possess a cell phone. I suspect that the proportion is much lower in my little town 2000 people. So I conduct a survey, taking a random sample of 400 people and the sample proportion is 85%.
##H_0: p = 0.97##
##H_a: p < 0.97##
Set significance level ##\alpha = 0.05##

I use the formula ##z = \frac{0.85 - 0.97}{\sqrt{\frac{0.97 \times 0.03}{400}}}##. I look up a p-value from a z table. If ##\text{p-value} \leq 0.05## I reject ##H_0##; if not I fail to reject ##H_0##.

Does this sound right?
Yes. You could do it that way. However, this is the only time in your life that you will use the normal approximation for a test of proportions. In real life you will let a computer do the number crunching for you so that you don’t need to do the approximation.

Or if you go Bayesian you can just calculate the answer with the ##\beta(340+1,\ 60+1)## conjugate posterior.
 
  • Like
Likes Agent Smith
  • #25
@Dale and @FactChecker what if my sample proportion ##0\%## or ##100\%##, What then?
 
  • #26
Agent Smith said:
@Dale and @FactChecker what if my sample proportion ##0\%## or ##100\%##, What then?
It's just like anything else. Any finite sample can end up being all of one type.
 
  • #27
Agent Smith said:
@Dale and @FactChecker what if my sample proportion ##0\%## or ##100\%##, What then?
Then the normal approximation definitely will not work. The Bayesian formula above works fine, and so does Fisher’s exact test.
 
  • #28
@Dale & @FactChecker :biggrin: Ok, what if the sample proportion were ##1\%## or ##99\%##, what then? How would I go about testing my hypothesis?
 
  • #29
Dale said:
Then the normal approximation definitely will not work. The Bayesian formula above works fine, and so does Fisher’s exact test.
if the sample size was large enough for the selected confidence level, such an extreme sample gives a very clear statistical result.
 
  • Like
Likes Agent Smith
  • #30
So ##2## scenarios:

1) Population proportion = ##0.97## and sample proportion = ##0.01##
My hypotheses:
##H_0: p = 0.97##
##H_a: p < 0.97##

2) Population proportion = ##0.97## and sample proportion = ##0.99##
##H_0: p = 0.97##
##H_0: p < 0.97##

right?
 
  • #31
Agent Smith said:
So ##2## scenarios:

1) Population proportion = ##0.97## and sample proportion = ##0.01##
My hypotheses:
##H_0: p = 0.97##
##H_a: p < 0.97##
For a reasonable sample size and selected confidence level, this would allow you to accept alternative hypothesis.
You should be aware that some scientific fields require very extreme confidence levels. In tests for the discovery of new subatomic particles, 5 sigma is required to accept that you have found a new particle. The world is very skeptical of such claims.
Agent Smith said:
2) Population proportion = ##0.97## and sample proportion = ##0.99##
##H_0: p = 0.97##
##H_0: p < 0.97##
Then the sample actually strengthens the null hypotheses compared to that alternative hypothesis.
 
  • #32
FactChecker said:
this would allow you to accept alternative hypothesis
Technically, null hypothesis significance testing is only meant to challenge the null hypothesis. So you can only “reject the null hypothesis” or “fail to reject the null hypothesis”. It is usually not justified to “accept the null hypothesis“ or “accept/reject the alternative hypothesis”. However, plenty of scientists do make such claims anyway
 
  • Like
  • Care
Likes Agent Smith and FactChecker
  • #33
Agent Smith said:
@Dale & @FactChecker :biggrin: Ok, what if the sample proportion were ##1\%## or ##99\%##, what then? How would I go about testing my hypothesis?
I would still avoid the normal approximation. The Fisher’s exact test is implemented in most statistical packages, and the Bayesian approach is even easier to calculate. No reason to risk a bad approximation when you are that close to the edge.
 
  • Wow
Likes Agent Smith
  • #34
Dale said:
Technically, null hypothesis significance testing is only meant to challenge the null hypothesis. So you can only “reject the null hypothesis” or “fail to reject the null hypothesis”. It is usually not justified to “accept the null hypothesis“ or “accept/reject the alternative hypothesis”. However, plenty of scientists do make such claims anyway
Good point!
 
  • #35
Dale said:
I would still avoid the normal approximation. The Fisher’s exact test is implemented in most statistical packages, and the Bayesian approach is even easier to calculate. No reason to risk a bad approximation when you are that close to the edge.
Not possible at my level (high school). Thanks though. I'll look it up

In my lessons they say exactly what you say here. Either reject ##H_0## or fail to reject ##H_0##. I wonder why that is. Wouldn't it be good to be able to accept ##H_0##? For example I might want to check the proportion of regular customers at a restaurant hasn't changed after a staff change. 🤔
 

Similar threads

Replies
0
Views
6K
Replies
5
Views
3K
Replies
2
Views
3K
Replies
1
Views
20K
Replies
1
Views
16K
Replies
1
Views
24K
Replies
1
Views
19K
Replies
1
Views
26K
Back
Top