B Choosing an appropriate hypothesis

  • Thread starter Thread starter Agent Smith
  • Start date Start date
  • Tags Tags
    population
Click For Summary
The discussion centers on formulating appropriate null and alternative hypotheses regarding the increase in the proportion of students whose mothers graduated from college, using data from the National Center for Education Statistics. The correct hypotheses to test whether the proportion has changed from 31% to 32% are debated, with option e) (Ho: P = 0.31, Ha: P ≠ 0.31) being favored for its neutrality. However, some participants argue that a one-tailed test (Ho: P = 0.31, Ha: P > 0.31) would be more appropriate if the goal is to determine an increase. Concerns are raised about potential biases if the same sample overlaps between years, emphasizing the importance of proper sampling methods. Ultimately, the conversation highlights the complexities of hypothesis testing in statistical analysis.
  • #61
@Dale I want to know if a coin A I have is fair/unfair.
My hypothesis: ##H:## Coin A is unfair
I flip it 1000 times and record number of heads (H) and number of tails (T).
I now have the posterior distribution (?) = evidence.
I have Bayes' formula: ##P(H|E) = \frac{P(H) \times P(E|H)}{P(E)}##. How do I get the values for ##P(E|H)## and ##P(E)## from my experimental evidence? What I know is ##P(H)## the prior probability would depend on previous work/initial evidence (maybe I flipped my coin 50 times the previous day and notice 39 heads)
 
Physics news on Phys.org
  • #62
Agent Smith said:
I want to know if a coin A I have is fair/unfair.
Yes. The first thing you need to do is make two decisions. These are choices that you make. They do not come from statistics, and ideally they should be done before you collect the data.

The first is to choose what would you consider to be close enough to fair that you would call it practically fair.

Second, you need to decide how confident you want to be that the coin is practically fair.
 
  • Like
Likes FactChecker
  • #63
Dale said:
Yes. The first thing you need to do is make two decisions. These are choices that you make. They do not come from statistics, and ideally they should be done before you collect the data.

The first is to choose what would you consider to be close enough to fair that you would call it practically fair.

Second, you need to decide how confident you want to be that the coin is practically fair.
Fair would be 50/50 heads and tails. Proportion of heads = ##p_H = 0.5 \pm 0.02## (that's the interval [0.48, 0.52] and proportion of tails = ##1 - p_H##. That would be my first choice.

Second choice, I want to be ##95\%## confident that the coin is fair.

So I have my data, I flipped the coins ##1000## times. Suppose I get (##2## scenarios)
1. Number of heads 490 i.e. ##p_H = 0.49##
2. Number of heads 358 i.e. ##p_H = 0.358##

I have my formula (Bayes' theorem).

How do I use my data to compute ##P(E|H)## and ##P(E)##?
 
  • #64
Agent Smith said:
Fair would be 50/50 heads and tails. Proportion of heads = pH=0.5±0.02 (that's the interval [0.48, 0.52] and proportion of tails = 1−pH. That would be my first choice.

Second choice, I want to be 95% confident that the coin is fair.
Perfect, with these we can proceed.

Agent Smith said:
How do I use my data to compute P(E|H) and P(E)?
P(E) is just a normalization that we use to scale the right hand side of the equation so that the integral is 1 (because a probability density function has to integrate to 1 by definition).

Coin flips can be represented as a binomial distribution B(n,p). In this case you are doing ##n=1000## flips, and the probability of heads on each flip is ##p=H##. So $$P(E|H)=\binom{1000}{E} H^E (1-H)^{1000-E}$$

The next thing is to choose your prior. This is your belief in the fairness of the coin (including your uncertainty). For a binomial likelihood like we have here, there is a very convenient form of the prior called a conjugate prior. For the binomial likelihood the conjugate prior is the beta distribution. If our prior is a beta distribution ##\beta(a,b)## then our posterior will be ##\beta(a+E,b+(1000-E))##.

So let's say that we wanted to say that we had a completely uniform prior. In other words, before running the experiment we did not have any reason to believe that the coin would land heads 50% of the time versus 99% of the time. This is called a uniform or uninformative prior. So that would be a prior of ##\beta(1,1)##.

Scenario 1. After observing ##E=490## heads (and ##1000-E=510## tails) then we would have the posterior distribution ##\beta(491,511)##. This has a probability of ##0.708## of being within the ROPE. So this is evidence that the coin is probably practically fair, but it is not strong enough evidence to meet your confidence requirement. There is a non-negligible ~30% chance that it is not practically fair, given this data and the uninformed prior.

1734496328711.png


Scenario 2. After observing ##E=358## heads (and ##1000-E=642## tails) then we would have the posterior distribution ##\beta(359,643)##. This has a probability of ##3.55 \ 10^{-15}## of being within the ROPE. This is pretty strong evidence that the coin is not practically fair.
1734496378337.png
 
Last edited:
  • Like
Likes Agent Smith and FactChecker
  • #65
@Dale Too advanced a topic for me, but I have a much better grasp of what's going on.
Here's where I trip up:
1. I don't know what's a ##\beta## distribution (you linked me to the Wikipage. :thumbup: )
2. I didn't quite get this 👇
Capture.PNG

My brain's telling me ##E## and ##H## aren't numbers and so I don't get the righthand side of the equality.

So given a confidence level ##95\%##, if ##\beta (a, b)## it means that the interval ##[a, b]## gives you (in my example) the proportions that can be considered equivalent to ##0.5## (the coin is fair). If my experimental proportion falls outside this range, the probability that the coin is fair is ##< 0.05##? Correct?
 
  • #66
Agent Smith said:
Fair would be 50/50 heads and tails. Proportion of heads = ##p_H = 0.5 \pm 0.02## (that's the interval [0.48, 0.52] and proportion of tails = ##1 - p_H##. That would be my first choice.

Second choice, I want to be ##95\%## confident that the coin is fair.

So I have my data, I flipped the coins ##1000## times. Suppose I get (##2## scenarios)
1. Number of heads 490 i.e. ##p_H = 0.49##
2. Number of heads 358 i.e. ##p_H = 0.358##

I have my formula (Bayes' theorem).

How do I use my data to compute ##P(E|H)## and ##P(E)##?
If you are going to use the Bayesian approach, the third thing you should initially decide is what your initial distribution of coin-bias probabilities should be. If you got the coin from Mother Teresa, you can assume a high probability of a reasonably fair coin. That would give you a limited-normal distribution (limited between 0 and 1) with a mean of 0.5 and a small variance. If you are using a coin at an amusement park game, you might want to assume that the coin is more likely to be biased toward the park winning. That would give you a skewed probability (again limited between 0 and 1) of coin-bias toward the park winning. This decision gives you a lot of freedom to tune your analysis to particular situations, but is susceptible to your initial bias.
 
Last edited:
  • Like
Likes Agent Smith and Dale
  • #67
Agent Smith said:
1. I don't know what's a β distribution (you linked me to the Wikipage. :thumbup: )
The beta distribution is used to model frequencies a lot because it has 0 probability outside of the range 0 to 1. So it has some nice mathematical properties for this application.

It is possible to do this without the beta distribution. I can show that later.

Agent Smith said:
My brain's telling me E and H aren't numbers and so I don't get the righthand side of the equality.
##E## is the evidence, so it is a number. Specifically, it is the number of heads out of 1000 coin flips. Of course, you could replace the 1000 with another number if you didn't want to do 1000 flips.

##H## is a variable. It is every possible hypothesis of the frequency of heads for the coin. So ##H## ranges from 0 to 1. ##H=0.5## is a coin that is exactly fair. ##0.48<H<0.52## is a coin that is practically fair.

Agent Smith said:
So given a confidence level 95%, if β(a,b) it means that the interval [a,b] gives you (in my example) the proportions that can be considered equivalent to 0.5 (the coin is fair). If my experimental proportion falls outside this range, the probability that the coin is fair is <0.05? Correct?
With the posterior and the ROPE we can directly compute the probability that the coin is fair. All we do is integrate the posterior over the ROPE. So, zooming in on the first scenario, we integrate over the orange region to get the 0.708 probability that the coin is fair:
1734533107110.png
 
  • #68
Dale said:
It is possible to do this without the beta distribution. I can show that later.
Here is a version in an Excel spreadsheet where you can see the calculation directly. It doesn't use the beta distribution at all, but I approximate the possible hypotheses discretely. Specifically, I allow only hypotheses with frequencies that are integer multiples of 1/100. So you could hypothesize a coin with a frequency of 0.43, but not a coin with a frequency of 0.432

The first column you put the evidence in terms of a number of heads and a number of tails.

In the second column you put in your prior beliefs. Right now I have it set for the uniform prior.

The remaining columns show all of the calculations needed to use Bayes here.
 

Attachments

  • #69
Capture.PNG


This is the binomial theorem I believe. So this is the probability of the evidence given the hypothesis. So I've flipped the coin ##1000## times and I get ##490## heads, that's a heads proportion = ##0.49##.

You said ##H = 0.5##. I don't quite get that. Shouldn't it be ##H = 0.49##. Is the hypothesis the coin is fair?

Capture.PNG


This is the probability distribution of frequencies of heads GIVEN the coin is fair?
 
Last edited:
  • #70
##P(H|E)## is usually evaluated as a probability distribution for all possible values of ##H##
 
  • Like
Likes Agent Smith
  • #71
@Dale and @FactChecker I think the discussion we've had is adequate for my level. Thank you.

One last question. For Bayes' theorem given as ##P(H|D) = \frac{P(H) \times P(D|H)}{P(D)}##, the Wikipage says that ##P(D) \ne 0## (division by ##0##). But I noticed that since ##P(\neg A) = 1 - P(A)##, if ##P(D) = 0##, we know that ##P(\neg D) = 1##. Can't we use this knowledge to solve the division by ##0## problem?

So we could do ##P(H|\neg D) = \frac{P(H) \times P(\neg D|H)}{P(\neg D)} = \frac{P(H) \times P(\neg D|H)}{1} = P(H) \times P(\neg D|H)##
 
Last edited:
  • #72
Agent Smith said:
Can't we use this knowledge to solve the division by 0 problem?
How? I don’t see what knowing a different number would do for you here.
 
  • #73
Dale said:
How? I don’t see what knowing a different number would do for you here.
We can still test the probability of the hypothesis, but it looks as though posterior probability < prior probability because(?) ##P(\neg D|H)## is going to be low.
 

Similar threads

Replies
2
Views
2K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 12 ·
Replies
12
Views
11K
  • · Replies 2 ·
Replies
2
Views
3K
  • Sticky
  • · Replies 0 ·
Replies
0
Views
22K
  • Sticky
  • · Replies 0 ·
Replies
0
Views
17K
  • Sticky
  • · Replies 1 ·
Replies
1
Views
25K
  • Sticky
  • · Replies 0 ·
Replies
0
Views
21K
  • Sticky
  • · Replies 1 ·
Replies
1
Views
28K
  • Sticky
  • · Replies 2 ·
Replies
2
Views
70K