B Choosing an appropriate hypothesis

FactChecker · Dec 15, 2024

Agent Smith said:

So ##2## scenarios:

1) Population proportion = ##0.97## and sample proportion = ##0.01##
My hypotheses:
##H_0: p = 0.97##
##H_a: p < 0.97##

For a reasonable sample size and selected confidence level, this would allow you to accept alternative hypothesis.
You should be aware that some scientific fields require very extreme confidence levels. In tests for the discovery of new subatomic particles, 5 sigma is required to accept that you have found a new particle. The world is very skeptical of such claims.

Agent Smith said:

2) Population proportion = ##0.97## and sample proportion = ##0.99##
##H_0: p = 0.97##
##H_0: p < 0.97##

Then the sample actually strengthens the null hypotheses compared to that alternative hypothesis.

Dale · Dec 15, 2024

FactChecker said:

this would allow you to accept alternative hypothesis

Technically, null hypothesis significance testing is only meant to challenge the null hypothesis. So you can only “reject the null hypothesis” or “fail to reject the null hypothesis”. It is usually not justified to “accept the null hypothesis“ or “accept/reject the alternative hypothesis”. However, plenty of scientists do make such claims anyway

Dale · Dec 15, 2024

Agent Smith said:

@Dale & @FactChecker Ok, what if the sample proportion were ##1\%## or ##99\%##, what then? How would I go about testing my hypothesis?

I would still avoid the normal approximation. The Fisher’s exact test is implemented in most statistical packages, and the Bayesian approach is even easier to calculate. No reason to risk a bad approximation when you are that close to the edge.

FactChecker · Dec 15, 2024

Dale said:

Technically, null hypothesis significance testing is only meant to challenge the null hypothesis. So you can only “reject the null hypothesis” or “fail to reject the null hypothesis”. It is usually not justified to “accept the null hypothesis“ or “accept/reject the alternative hypothesis”. However, plenty of scientists do make such claims anyway

Good point!

Agent Smith · Dec 16, 2024

Dale said:

I would still avoid the normal approximation. The Fisher’s exact test is implemented in most statistical packages, and the Bayesian approach is even easier to calculate. No reason to risk a bad approximation when you are that close to the edge.

Not possible at my level (high school). Thanks though. I'll look it up

In my lessons they say exactly what you say here. Either reject ##H_0## or fail to reject ##H_0##. I wonder why that is. Wouldn't it be good to be able to accept ##H_0##? For example I might want to check the proportion of regular customers at a restaurant hasn't changed after a staff change.

Agent Smith · Dec 16, 2024

FactChecker said:

For a reasonable sample size and selected confidence level, this would allow you to accept alternative hypothesis.
You should be aware that some scientific fields require very extreme confidence levels. In tests for the discovery of new subatomic particles, 5 sigma is required to accept that you have found a new particle. The world is very skeptical of such claims.

Yes, I've met ##6## sigma. I suppose it's tough being a scientist.
Regarding my question ... a ##1\%## sample proportion should be quite a good decider between ##H_0## and ##H_a##. Likewise a sample proportion of ##99\%##.

Dale · Dec 16, 2024

Agent Smith said:

Wouldn't it be good to be able to accept H0?

You are able to accept the null hypothesis, but just not using a standard null significance hypothesis test.

Agent Smith · Dec 16, 2024

Dale said:

You are able to accept the null hypothesis, but just not using a standard null significance hypothesis test.

Which test do I use to accept a null hypothesis?

Dale · Dec 16, 2024

Agent Smith said:

Which test do I use to accept a null hypothesis?

First, you determine a region of practical equivalence (ROPE). This is determined not by statistics but by practical considerations. Like, for example, you might be testing the null hypothesis that a coin is fair, and you could decide that a coin that is unfair with p=0.49 to p=0.51 is equivalent to a fair coin for your practical purpose.

Second, you construct a confidence interval. If the confidence interval is entirely within your ROPE then you can accept the null hypothesis.

Agent Smith · Dec 16, 2024

Dale said:

First, you determine a region of practical equivalence (ROPE). This is determined not by statistics but by practical considerations. Like, for example, you might be testing the null hypothesis that a coin is fair, and you could decide that a coin that is unfair with p=0.49 to p=0.51 is equivalent to a fair coin for your practical purpose.

Second, you construct a confidence interval. If the confidence interval is entirely within your ROPE then you can accept the null hypothesis.

Can I do the following:

For a fair coin, the proportion of heads ##p = 0.5##.
I take a sample of 100 and see that the proportion of heads = ##0.48##. I construct a ##95\%## confidence interval around ##0.48## and if ##0.5## lies in that interval (say it's ##0.48 \pm 0.03##), I can say that the coin is fair.

What actually prevents us from accepting the null?

FactChecker · Dec 16, 2024

Often (usually?) the purpose of the statistical analysis is to decide if a sample result is so unlikely, assuming the null hypothesis, that the null hypothesis should be rejected. That can be simple.
Suppose you assume that a coin has an equal likelihood of heads and tails. Suppose a trial of 20 coin flips gives 19 heads and 1 tail. You should be very skeptical of your assumption. Hypothesis testing gives you a mathematical justification for your skepticism. You do not have to specify a probability for the alternative hypothesis. You just know that the result you got was very unlikely given the null hypothesis.

Agent Smith · Dec 16, 2024

FactChecker said:

Often (usually?) the purpose of the statistical analysis is to decide if a sample result is so unlikely, assuming the null hypothesis, that the null hypothesis should be rejected. That can be simple.
Suppose you assume that a coin has an equal likelihood of heads and tails. Suppose a trial of 20 coin flips gives 19 heads and 1 tail. You should be very skeptical of your assumption. Hypothesis testing gives you a mathematical justification for your skepticism. You do not have to specify a probability for the alternative hypothesis. You just know that the result you got was very unlikely given the null hypothesis.

Yes, I have notes on that topic, but I'm not clear on the issue. @Dale and I were discussing why the null hypothesis is not accepted. What do you think? If a given mean has an associated p-value > alpha then I can't reject ##H_0## but then I can't accept it either. Why? As a shrewd businessman selling pizza, I would like to know if my customer base hasn't changed after a recent outbreak of gastroenteritis in my town. How do I do that?

Dale · Dec 16, 2024

Agent Smith said:

Can I do the following:

For a fair coin, the proportion of heads ##p = 0.5##.
I take a sample of 100 and see that the proportion of heads = ##0.48##. I construct a ##95\%## confidence interval around ##0.48## and if ##0.5## lies in that interval (say it's ##0.48 \pm 0.03##), I can say that the coin is fair.

No. This won’t work. If you have a statistical test for some claim, then you require that the evidence be strong before making the claim.

The normal claim that we test is to “reject the null hypothesis”. In order to make that claim we either need to have a little data that is very different from the null or a lot lot of data that is slightly different. Only in that case will we feel that we have data that is strong enough to convince us to “reject the null hypothesis” otherwise we will “fail to reject the null hypothesis”.

So, to test the claim that we “accept the null hypothesis” we need strong data to support the claim. With your procedure the less data you collect, the broader the confidence interval and the more likely the test will pass. Meaning that the proposed test is satisfied with weak data, so it is not a convincing test.

The ROPE based test requires tight confidence intervals, so the data has to be strong to support the claim.

Agent Smith said:

What actually prevents us from accepting the null?

Nothing, you just have to use the right test to do it.

Agent Smith · Dec 16, 2024

Dale said:

No. This won’t work. If you have a statistical test for some claim, then you require that the evidence be strong before making the claim.

The normal claim that we test is to “reject the null hypothesis”. In order to make that claim we either need to have data that is very different from the null or a lot of data. Only in that case will we feel that we have data that is strong enough to convince us to “reject the null hypothesis” otherwise we will “fail to reject the null hypothesis”.

So, to test the claim that we “accept the null hypothesis” we need strong data to support the claim. With your procedure the less data you collect, the broader the confidence interval and the more likely the test will pass. Meaning that the proposed test is satisfied with weak data, so it is not a convincing test.

The ROPE based test requires tight confidence intervals, so the data has to be strong to support the claim.

Nothing, you just have to use the right test to do it.

I looked up ROPE (too advanced). I'm B for Basic at the moment.
I though that if the population mean ##\mu## is such that, for sample mean ##\overline x##, we have ##\mu = \overline x \pm \text{Margin of Error}##, at the ##99\%## confidence interval, we have strong evidence that the ##\mu## is in the interval ##[\overline x - \text{Margin of Error}, \overline x + \text{Margin of Error}]##. Is this useless?

FactChecker · Dec 16, 2024

Agent Smith said:

Yes, I have notes on that topic, but I'm not clear on the issue. @Dale and I were discussing why the null hypothesis is not accepted. What do you think? If a given mean has an associated p-value > alpha then I can't reject ##H_0## but then I can't accept it either. Why? As a shrewd businessman selling pizza, I would like to know if my customer base hasn't changed after a recent outbreak of gastroenteritis in my town. How do I do that?

I think that you may be barking up the wrong tree. If you want to stick with the standard, basic, methods and terminology, then the null hypothesis should be the assumption that will not have a seriously bad consequence if it is wrong (Type II error). The entire subject of hypothesis testing is set up to stay with the null hypothesis unless you can show convincing proof that it is wrong -- nothing else. Certain scientific fields may require very extreme results (over 5 sigma) to abandon the null hypothesis.

Perhaps you would prefer the approach of using a sample to determine a confidence range on a population parameter? Although it has similarities and relationships to a test of hypothesis, it is not the same.
Alternatively, there are a variety of tests to determine if a sample distribution fits (is a likely/unlikely sample from) a given distribution. That may be what you are looking for.

Dale · Dec 16, 2024

Agent Smith said:

I looked up ROPE (too advanced). I'm B for Basic at the moment.

Well, I can either answer the questions you are asking or I can just tell you that there is no basic answer. Which do you prefer?

Agent Smith said:

I though that if the population mean ##\mu## is such that, for sample mean ##\overline x##, we have ##\mu = \overline x \pm \text{Margin of Error}##, at the ##99\%## confidence interval, we have strong evidence that the ##\mu## is in the interval ##[\overline x - \text{Margin of Error}, \overline x + \text{Margin of Error}]##. Is this useless?

That is what the ROPE concept is based on.

Agent Smith · Dec 16, 2024

@FactChecker most informative. @Dale suggested ROPE (beyond reach) and I'm grateful to both of you. I don't know how statistical hypothesis testing is applied in real science, but it seems relevant to me. Do you have links to an article (a simple application of the method) which I can reference?

Agent Smith · Dec 16, 2024

Dale said:

Well, I can either answer the questions you are asking or I can just tell you that there is no basic answer. Which do you prefer?

That is what the ROPE concept is based on.

Thanks for the confirmation.

FactChecker · Dec 16, 2024

Agent Smith said:

Yes, I have notes on that topic, but I'm not clear on the issue. @Dale and I were discussing why the null hypothesis is not accepted. What do you think? If a given mean has an associated p-value > alpha then I can't reject ##H_0## but then I can't accept it either. Why? As a shrewd businessman selling pizza, I would like to know if my customer base hasn't changed after a recent outbreak of gastroenteritis in my town. How do I do that?

This is a different hypothetical scenario than you started with in post #1. Sorry, but the subject of statistics is HUGE and we could come up with a million hypotheticals and questions. I recommend that you look up the subject of confidence intervals in a basic statistics book before proceeding.

Agent Smith · Dec 16, 2024

FactChecker said:

This is a different hypothetical scenario than you started with in post #1. Sorry, but the subject of statistics is HUGE and we could come up with a million hypotheticals and questions. I recommend that you look up the subject of confidence intervals in a basic statistics book before proceeding.

The issue was different. I guess I should've stuck to the original example question and so I will. How would I confirm the hypothesis that the proportion of cell phone owners in my small town hasn't changed. It is a valid hypothesis, right?

@Dale was kind enough to use a simpler (coin flip) example (or was it you?), but I realize now that it suffers from the same issue viz. I can't accept ##H_0##. Hence, he suggested ROPE (which is confidence interval based inference). It's

that a high school student won't be able to assist in wildlife conservation research, being unable to test (say) the mocking jay population hasn't changed.

Dale · Dec 16, 2024

I wrote an Insights article where I used Bayesian statistics for a coin toss example. I used the ROPE approach to assess the null hypothesis.

https://www.physicsforums.com/insights/how-bayesian-inference-works-in-the-context-of-science/

FactChecker · Dec 16, 2024

Agent Smith said:

The issue was different. I guess I should've stuck to the original example question and so I will. How would I confirm the hypothesis that the proportion of cell phone owners in my small town hasn't changed. It is a valid hypothesis, right?

No. It's too strong and/or vague. Do you mean hasn't changed AT ALL -- was 0.31000000 and still is 0.31000000? If the conservative and cautious assumption is that it hasn't changed, then you shouldn't assume otherwise unless you have solid evidence. Why would you?
PS. There are techniques for optimal decision making if that is what you are looking for. It is an entire subject in itself.

Agent Smith said:

@Dale was kind enough to use a simpler (coin flip) example (or was it you?), but I realize now that it suffers from the same issue viz. I can't accept ##H_0##.

Why? If there are costs or dangers to balance, then you should define an example like that, where retaining an erroneous null hypothesis assumption is bad. Then you can compare the costs of Type I versus Type II errors.

Agent Smith said:

Hence, he suggested ROPE (which is confidence interval based inference). It's that a high school student won't be able to assist in wildlife conservation research, being unable to test (say) the mocking jay population hasn't changed.

High school students can use statistical hypothesis testing. (Please don't bring up yet another hypothetical scenario like mocking jays. Jumping around is just confusing.)

Agent Smith · Dec 16, 2024

@Dale what do you know, I had saved your article for a later read. I don't know if my compliments are of any value, but it's such a well-written article. Thank you.

I will now describe myself conducting a hypothesis test.
##H_0: \text{The coin is fair}##
##H_a: \text{The coin is not fair}##

I now take the coin and do a long series of coin flips (you did 4000) and count the outcomes. Say number of tails = A and number of Heads = E. I suppose this is my "evidence". We have something called a posterior distribution and highest posterior distribution. I didn't quite understand these. What are they? Are they probability distribution of the outcomes (heads/tails). They look as though they've been done by a computer.
##P(X \in [0.45, 055]) = 0.95## would mean the probability that the random variable X = the frequency of tails, is in the range ##[0.045, 0.55]## is ##95\%##. Correct?

Bayes' Theorem: ##P(H|E) = \frac{P(H) \times P(E|H)}{P(E)}##

I don't know how we we could use ##P(X \in [0.45, 0.55]) = 0.95## in Bayes' formula? What are the values for ##P(E|H)## and ##P(E)##?

Setting these doubts aside for the moment, we then end up with a posterior probability for ##H_0##. Is there some kind of threshold probability here, like p-value?

This is also excellent: ##\frac{P(H_A|E)}{P(H_B|E)} = \frac{P(E|H_A) \times P(H_A)}{P(E|H_B \times P(H_B)}##

I guess what I'm asking for is a walkthrough of a Bayes' theorem based hypothesis test (my hypothesis being a coin is fair) which I can do at home. Please

Agent Smith · Dec 16, 2024

FactChecker said:

No. It's too strong and/or vague. Do you mean hasn't changed AT ALL -- was 0.31000000 and still is 0.31000000? If the conservative and cautious assumption is that it hasn't changed, then you shouldn't assume otherwise unless you have solid evidence. Why would you?
PS. There are techniques for optimal decision making if that is what you are looking for. It is an entire subject in itself.

Why? If there are costs or dangers to balance, then you should define an example like that, where retaining an erroneous null hypothesis assumption is bad. Then you can compare the costs of Type I versus Type II errors.

High school students can use statistical hypothesis testing. (Please don't bring up yet another hypothetical scenario like mocking jays. Jumping around is just confusing.)

I really don't know how real-life hypotheses tests are done. Can you suggest me some DIY statistical hypotheses testing experiments; off the top of my head, I can think of testing a coin for fairness/unfairness.

I would definitely like things to be as convenient as possible for me and everyone else involved, but I'm doing high school statistics (B) and stuff have been simplified for us.

Sorry for causing confusion by introducing a new area of statistical research (wildlife conservation). It's just that children are being taught environmental conservation is important and I thought you might have experience in that domain, making it easier for you to respond.

Thank you

FactChecker · Dec 16, 2024

Suppose you want to show that a parameter, ##p##, (mean or variance, etc.) of a distribution is within ##\epsilon## of a value ##p_0## with a certain confidence, ##c## (=95%, 97.5%, etc.). You need the mean and variance for the sample parameter estimator, ##\hat p##. Then you need to collect a sample large enough that the ##c## confidence interval derived from that sample is within ##(p_0-\epsilon, p_0+\epsilon)##.
Confidence intervals are discussed in most introductory statistics books.

Agent Smith · Dec 16, 2024

FactChecker said:

Suppose you want to show that a parameter, ##p##, (mean or variance, etc.) of a distribution is within ##\epsilon## of a value ##p_0## with a certain confidence, ##c## (=95%, 97.5%, etc.). You need the mean and variance for the sample parameter estimator, ##\hat p##. Then you need to collect a sample large enough that the ##c## confidence interval derived from that sample is within ##(p_0-\epsilon, p_0+\epsilon)##.
Confidence intervals are discussed in most introductory statistics books.

Yes, all I can remember is ##\text{Parameter} = \text{Statistic} \pm z^* \frac{\sigma}{\sqrt n}##. I hope this is correct. I had asked @Dale a question, hope he'll reply soon. As for the confidence interval, if my ##p## is within it, I have reason to say that my ##p## hasn't changed, right? So if my ##p = 0.25## and my confidence interval is ##0.23 \pm 0.04##, I can accept my ##H_0##, yes?

FactChecker · Dec 16, 2024

If you want to PROVE that your parameter ##p## hasn't changed, that means you have to prove that it has not changed from ##p=0.2500000000000000## to ##p=0.2500000000000001##. Good luck.
Instead, you should specify an amount of change, ##\epsilon##, that is acceptable and prove that the entire CORRECTION: ~~acceptable range (p-\epsilon, p+\epsilon) is inside the confidence interval~~ confidence interval is inside the acceptable range ##(p-\epsilon, p+\epsilon)##. That requirement would determine the required sample size. That is what I tried to indicate in Post #55.

Agent Smith · Dec 17, 2024

@FactChecker So there is no way, given what you said, to check if a parameter has not changed? I could take a sample, compute a statistic and then construct a confidence interval (like I did above), no. Dale mentioned ROPE, based on Bayes' theorem. I understand that the larger the sample, the lower our margin of error (this was taught to me). However, can I or can I not accept the null hypothesis (assume everything's perfect for an inference to be made)

FactChecker · Dec 17, 2024

It's not clear to me whether you are expecting too much from a statistical experiment or are just not being careful in wording your question. Being precise in mathematical statements is a learned skill. You can show at a certain level of confidence, that the parameter has not changed significantly. You can not prove that it has not changed even the tiniest amount.

Dale · Dec 17, 2024

Agent Smith said:

I don't know if my compliments are of any value, but it's such a well-written article. Thank you.

Thank you for that, I appreciate the feedback.

Agent Smith said:

We have something called a posterior distribution and highest posterior distribution. I didn't quite understand these. What are they?

Fair warning, this is well beyond basic. A lot of that is covered in the previous articles in that series. But basically the key idea is as follows:

We are going to treat uncertain things like population means, and coin flip frequencies, directly as random variables. Thus they have their own associated probability density functions and so forth.

So, in this case, our uncertainty about the fairness is represented as a probability distribution on the frequency. If we were very certain that it is unfair then the probability distribution on the frequency might have a mean of 0.6 and a standard deviation of 0.02. If we thought it is fair, but we were not very certain about it, then maybe it would have a mean of 0.5 and a standard deviation of 0.2. In any case, the frequency is not just a number, but a probability distribution.

The posterior distribution is just the probability distribution after (posterior to) collecting the data. And the highest posterior density interval is the smallest interval you can make that contains 95% (or whatever level you choose) of the posterior.

Agent Smith said:

P(X∈[0.45,055])=0.95 would mean the probability that the random variable X = the frequency of tails, is in the range [0.045,0.55] is 95%. Correct?

Yes.

Agent Smith said:

Is there some kind of threshold probability here, like p-value?

Yes, you can do the same kinds of things. The Bayesian techniques are not as old as the usual ones, so they don’t have quite as many traditions, like p values of 0.05, but you can use that one if you like and it often reduces arguments to use “traditional” values like that.

Agent Smith said:

I guess what I'm asking for is a walkthrough of a Bayes' theorem based hypothesis test (my hypothesis being a coin is fair) which I can do at home

So first, you have to choose your ROPE. What would you consider to be close enough to fair that you would call it practically fair.

Second, you need to decide how confident you want to be that it is practically fair.

B Choosing an appropriate hypothesis

Similar threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

I A variant of the Monty Hall problem

I What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

I Please Explain (actually explain) The Monty Hall Problem

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers