I Does the statistical weight of data depend on the generating process?

Click For Summary
The discussion centers on whether the statistical weight of identical data sets, generated by different processes, affects the evidence for or against a hypothesis. Two couples with the same child gender outcomes provide contrasting motivations for their family planning, leading to different interpretations of the data's implications regarding gender bias. The analysis highlights that frequentist and Bayesian approaches yield different insights; frequentists focus on the likelihood of observing the data under a specific hypothesis, while Bayesians consider the data as fixed and the hypothesis as variable. The conversation emphasizes the importance of understanding the underlying processes that generate data, as they can significantly influence the conclusions drawn about probabilities. Ultimately, the distinction in experimental design and assumptions is crucial for accurate statistical interpretation.
  • #91
PeterDonis said:
What is "the general case"? We are assuming for this discussion that there is no second parameter--p is the same for all couples.

If by "the general case" you mean ##p = 0.5## (or ##\lambda = 0.5## in @Dale's notation), then the actual evidence is that this is false; the global data seems to show a value of around ##0.525## to ##0.53##.

Yes, but does it question it to a different extent for couple #2 vs. couple #1? Does their different choice of process make a difference here?

Yes, I know it's not really ##0.5##. That just makes the calculations a bit harder and asymmetrical.

The main difference is that the distribution of families are different.

Case #1 has families all with seven children (i.e. families who set out with that policy always end up with seven children).

Case #2 has families with two children upwards.

This creates an asymmetry that gets picked up in the calculations. The simple calculations I've done above. But also, if we did add another parameter, it may well be reflected there also.

For example, my guess would be that the second family would be more likely to be one of the predisposed couples than the first. I could run an example tomorrow to check this, but I think I can see how the calculations will come out.
 
Physics news on Phys.org
  • #92
Dale said:
The only difference between the couples was their intentions.

The intentions of the couples, not the researchers (us) who are evaluating the data. The p-value hacking issue is an issue about the intentions of the researchers.

However, I can see an argument here regarding the intentions of the couples: the gametes don't know at each conception what rule the parents were using to decide when to stop having children. So there is a straightforward argument from the biological facts of conception that the process the parents are using to decide when to stop having children should not affect the data.

This is still not quite the same as saying that the p-value we calculate should not matter, but I can see a further argument: saying that the p-value matters is equivalent to saying that the data from couple #2 is being drawn from a different underlying distribution of births than the data from couple #1. But these underlying distributions are theoretical constructs in the minds of the researchers; they don't correspond to anything in the real world that actually affects the data. The only thing in the real world that they correspond to is the couple's intentions, and we just saw above that the couple's intentions don't affect the data.
 
  • #93
PeterDonis said:
However, I can see an argument here regarding the intentions of the couples: the gametes don't know at each conception what rule the parents were using to decide when to stop having children. So there is a straightforward argument from the biological facts of conception that the process the parents are using to decide when to stop having children should not affect the data.

I think this is the sort of argument to avoid. You need to calculate what is implied by the assumptions in the problem and what is being compared to what.

In this case, certain things had to happen in order for a case #2 family to end up with seven children. That's the sort of detail that can trip you up.
 
  • #94
PeroK said:
my guess would be that the second family would be more likely to be one of the predisposed couples than the first

I have not done a Bayesian calculation with ##\lambda## treated as a function of the individual couple instead of an unknown single parameter, but it seems to me that such a calculation would still say that, since the data sets of both couples are the same, our posterior distribution over whatever parameters we are estimating will be the same. The key here is that the difference we have information about for the two couples--the way they choose to decide when to stop having children--has no relationship that I can see between any difference between them that would be expected to be relevant to a difference in ##\lambda## between the two couples.

In fact, even if we discount the subjective judgment I just expressed, and decide to test the hypothesis that "there is some difference between these two couples that affects ##\lambda##", the fact that the two data sets are identical is evidence against any such hypothesis!
 
  • #95
PeterDonis said:
I have not done a Bayesian calculation with ##\lambda## treated as a function of the individual couple instead of an unknown single parameter, but it seems to me that such a calculation would still say that, since the data sets of both couples are the same, our posterior distribution over whatever parameters we are estimating will be the same. The key here is that the difference we have information about for the two couples--the way they choose to decide when to stop having children--has no relationship that I can see between any difference between them that would be expected to be relevant to a difference in ##\lambda## between the two couples.

In fact, even if we discount the subjective judgment I just expressed, and decide to test the hypothesis that "there is some difference between these two couples that affects ##\lambda##", the fact that the two data sets are identical is evidence against any such hypothesis!

I'll do a calculation tomorrow! It's after midnight here.
 
  • #96
PeroK said:
certain things had to happen in order for a case #2 family to end up with seven children.

And the same is true of couple #1. The fact that they decided in advance to have seven children does not mean they were guaranteed to succeed. The wife could have died in childbirth, or one of them could have become infertile, or...

The point is that none of these things have any connection to the process they decided to use. Or, if you don't like such absolute language, then in Bayesian terms, hypotheses along the lines of "couples who choose the process that couple #2 chose are more likely to have the wife die in childbirth than couples who choose the process that couple #1 chose" have such tiny prior probabilities that it doesn't even make sense to consider them when there are hypotheses in view with prior probabilities many orders of magnitude larger.
 
  • #97
PeterDonis said:
The intentions of the couples, not the researchers (us) who are evaluating the data. The p-value hacking issue is an issue about the intentions of the researchers.
No, it is about the experimenters as well as the analysts. The couples are experimenters since they had an experiment with a stopping criterion and collected data. You really should read the paper.
 
  • #98
PeroK said:
You need to calculate what is implied by the assumptions in the problem and what is being compared to what.

What I said about gametes is just as much implied by the assumptions in the problem as speculating about mishaps that could prevent a couple from getting to seven children. So I don't see that this (valid) point helps us much either way.
 
  • #99
Dale said:
The couples are experimenters since they had an experiment with a stopping criterion and collected data.

Fair enough.
 
  • #100
PeroK said:
Unless we allow the second parameter, all we are doing is picking up unlikely events. We can calculate the probability of these events, but unless we allow the second parameter, that is all we can say.
...
In summary, to make this a meaningful problem I think you have to add another parameter.
Interestingly, there is an approach called hierarchical Bayesian modeling which does exactly that.

Here is a paper where they add this additional parameter (a Bayesian hierarchical model for binomial data) in the context of polling:

http://www.stat.cmu.edu/~brian/463-663/week10/Chapter 09.pdf

In this model each poll is considered to have some underlying probability of a win (analogous to a couple's probability of having a boy) which is considered a "hyperparameter", then the respondents to the poll are binomial draws from the prior (analogous to each child being a draw from the couple's probability). The observed data then informs us both about the probability for each couple as well as the distribution of probabilities for the population. The major difference being that there are a small number of polls each with a relatively large number of samples while there are a large number of couples each with a relatively small number of children.
 
Last edited:
  • Like
Likes PeroK
  • #101
Dale said:
In this model each poll is considered to have some underlying probability of a win (analogous to a couple's probability of having a boy) which is considered a "hyperparameter", then the respondents to the poll are binomial draws from the prior (analogous to each child being a draw from the couple's probability). The observed data then informs us both about the probability for each couple as well as the distribution of probabilities for the population.

Hm, interesting! If I'm understanding this correctly, this methodology could provide a way of investigating questions like "does ##\lambda## depend on the criterion the couple uses to decide when to stop having children" by simply grouping the couples by that criterion--i.e., assuming that the same hyperparameter value applies to all couples in a group, but can vary between groups--and seeing whether the posterior distribution for the hyperparameter does in fact vary from group to group. And as I commented earlier, it would seem like the evidence described in the OP, where two couples are from different groups but produce the same outcome data, would be evidence against any hypothesis that the hyperparameter varied from group to group.
 
  • #102
PeterDonis said:
this methodology could provide a way of investigating questions like ...
Yes, you could do it that way. The details vary a little if you want to consider only these two stopping criteria or if you want to consider them as elements of a whole population of stopping criteria. The hierarchical model is more appropriate for the second case. Essentially this is the difference between a fixed effect and a random effect model.
PeterDonis said:
the evidence described in the OP ... would be evidence against any hypothesis that the hyperparameter varied from group to group
Yes
 
  • #103
PeterDonis said:
One way of rephrasing the question is whether and under what circumstances changing the stopping rule makes a difference. In particular, in the case under discussion we have two identical data sets that were collected under different stopping rules; the question is whether the different stopping rules should affect how we estimate the probability of having a boy given the data.

I won't weigh in on variance issues, but the long-run estimates for the probability of boy vs girl are the same with either strategy. (Mathematically its via use of Strong Law of Large Numbers, but in the real world we do have tons of data on demographics spanning many years which should give pretty good estimates) .

inspection paradox related items:

if you estimate/sample by children:
we should be able to see that our estimates are the same either way -- i.e. in all cases the modelling is a sequence of one child at a time (we can ignore zero probability events of exactly the same time of birth so there is a nice ordering here) and each child birth is bernouli trial -- a coin toss with probability of heads given by some parameter ##p##. Depending on "strategy" taken what may change is who is tossing the coin (parents) but it doesn't change the fact that in this model we have bernouli process where the tosser/parent is irrelevant for modelling purposes.

if you estimate/sample by parents/couples:
this one is a bit more subtle.
PeterDonis said:
This is not the correct stopping rule for couple #2. The correct stopping rule is "when there is at least one child of each gender". It just so happens that they had a boy first, so they went on until they had a girl. But if they had had a girl first, they would have gone on until they had a boy.
I evidently misread the original post. Given this structure I opted to view it as a baby markov chain (pun intended?), and use renewal rewards.

for strategy #2 we have a sequence of ##X_k## iid random variables -- where ##X_k## denotes the number of kids given by parent/couple ##k##.

Part 1) give a reward of 1 for each girl the couple k has, with probability ##p \in (0,1)##
direct calculation (using total expectation) gives
##E\big[R_k\big] = \frac{1-p + p^2}{1-p}##
Part 2) give a reward of 1 for each boy the couple k has, with probability ##1-p##
either mimicking the above calculation, or just changing variables we get
##E\big[R_k'\big] = \frac{(1-p)^2 + p}{p}##

and the total time (i.e. number of kids) per couple k is
##E\big[X_k\big] = E\big[R_k + R_1'\big] = E\big[R_k\big] + E\big[R_1'\big]##

with R(t) as the reward function (t = integer time by custom = total number of kids in our model)
##\frac{R(t)}{t} \to_{as} \frac{E\big[R_k\big]}{E\big[X_k\big]}= p##
##\frac{E[R(t)]}{t} \to \frac{E\big[R_k\big]}{E\big[X_k\big]} = p##
where wolfram did the simplifications
https://www.wolframalpha.com/input/?i=((++(1-p)+++p^2)/(1-p))/(+(++(1-p)+++p^2)/(1-p)+++((1-p)^2+++p)/p)

I suppose the result may seem obvious to some, but a lot of things that are 'obviously true', actually aren't true in probability, which is why there are many so called paradoxes in probability. (The 'paradox paradox' of course tells us that they aren't really paradoxes, just a mismatch between math and intuition.) E.g. in the above, taking the expectation of X in the denominator can break things if we don't have justification-- this is why I used Renewal Rewards theorem here.

We can apply the same argument to strategy one to see an expected reward of ##E\big[R_k] = 7\cdot p## and ##E\big[R_k'] = 7\cdot (1-p)## so, yes this too tends to ##p##

PeterDonis said:
Can you give examples of each of the two possibilities you describe? I.e, can you give an example of a question, arising from the scenario described in the OP, for which stopping rules don't matter? And can you give an example of a question for which they matter a lot?
I can try... it's an enourmously complex and broad question in terms of math, and then more so when trying to map these approximations to the real world. A classical formulation for martingales and random walks is in terms of gambling. The idea behind martingales is with finite dimensions a fair game stays fair, and a skewed game stays skewed, no matter what 'strategy' the better has in terms of bet sizing. With infinite dimensions all kinds of things can happen and a lot of care is needed -- you can even have a formally fair game with finite first moments but if you don't have variance (convergence in L2/ access to Central Limit Theorem) then extremely strange things can happen -- Feller vol 1 has a nice example of this (chapter 10, problem 15 in the 3rd edition).

With respect to you original post, I've shown that neither 'strategy' changes the long-run estimates of ##p##. The fact that both strategies not only have second moments, but valid moment generating functions should allow for concentration inequalities around the mean, which can show that the mean convergence isn't 'too slow', but this is outside the scope I think.
- - - -
For an explicit example / model:
As far as simple models and examples go, I suggest considering the simple random walk, where we move to the left with probability q = 1-p and to the right with probability p. Suppose we start at zero and have a stopping rule of "stop when we're ahead" i.e. once the net score is +1. for ##p \in [0,\frac{1}{2})##, our random variable ##T## for number of moves until stopping is defective (i.e. not finite WP1), which is problematic. For ##p=\frac{1}{2}## the process stops With Probability 1, but ##E\big[T\big] = \infty## which is problematic = (e.g. see earlier comment on wanting to have finite 2nd moment...). Now for ##p \in (\frac{1}{2}, 1]## from a modelling standpoint, things are nice, but is this "ok"? Well it depends on what we're looking into. This admittedly very simple model could be used to interpret a construct for a (simplified) pharmaceutical trial -- say if they used the stopping rule: stop when the experimental evidence looks good (e.g. when they're ahead). The result would be to only publish favorable results even if the drug's effects were basically a standard coin toss (and possibly with significant negative side effects "when they're behind"). When things go bad, the results wouldn't be reported as the trial would be ongoing or maybe they'd stop funding it and it would just show up as 'no valid trial as terminated before proper finish (stopping rule)'

it reminds me a bit of this
https://www.statisticsdonewrong.com/regression.html
which has some nice discussion under 'truth inflation' that seems germane here
- - - -
edit: thanks to Fresh for resetting a latex/ server bug
 
Last edited:
  • #104
PeroK said:
What does a Bayesian analysis give numerically for the data in post #1?
So, the easiest way to do this analysis is using conjugate priors. As specified by @PeterDonis we assume that both couples have the same ##\lambda##. Now, in Bayesian statistics you always start with a prior. A conjugate prior is a type of prior that will have the same distribution as the posterior. In this case the conjugate prior is the Beta distribution. If these were the first two couples that we had ever studied then we would start with an ignorant prior, like so:
LambdaIgnorantPrior.png


After observing 12 boys and 2 girls we would update our beliefs about the distribution of ##\lambda## from the Beta(1,1) prior to a Beta(3,13) posterior distribution, like so:
LambdaIgnorantPosterior.png


From that posterior we can calculate any quantity we want regarding ##\lambda##. For example, the mean is 0.81 with a 95% Bayesian confidence region from 0.60 to 0.96 and a median of 0.83 and a mode of 0.86. This confidence region should be close to the frequentist confidence interval.

Now, suppose that we did not want to pretend that this is the first couple that we had ever seen. We can incorporate the knowledge we have from other couples in the prior. That is something that cannot be done in frequentist statistics. Remember, ##\lambda## is not the proportion of boys in the overall population, it is the probability of a given couple producing boys. While the overall proportion of boys in the population is close to 0.5, individual couples can be highly variable. I know several couples with >80% girls and several with >80% boys, but we don't know if they would have started having more of the other gender if they continued. So let's set our prior to be symmetric about 0.5 and have 90% of the couples within the range ##0.25<\lambda<0.75##. This can be achieved with an informed Beta(5,5) prior.
LambdaInformedPrior.png


Now, after collecting data of 6 boys and 1 girl for each couple we find the posterior distribution is Beta(7,17), which leads to a lower estimate of the mean ##\lambda## of 0.71 with a 95% confidence region from 0.52 to 0.87.
LambdaInformedPosterior.png


Notice that the mean is substantially lower because we are informed by the fact that we have seen other couples before. When couples have a unusual ratio we automatically suspect random chance may be skewing the results a bit, but do admit that there is some possibility that there is something different with this couple so that the results are not totally random. The informed posterior shows that balanced assessment.
 
Last edited:
  • Like
Likes PeroK
  • #105
Dale said:
we assume that both couples have the same ##\lambda##.

This doesn't seem to be quite what you're assuming. As you describe your analysis, you're not assuming that ##\lambda## is fixed for all couples; you're allowing for the possibility that different couples might have different unknown factors at work that could affect their respective probabilities of producing boys. But you are assuming that we have no reason to suppose that either couple #1 or couple #2 in our example is more or less likely to have unknown factors skewing them in one direction or the other, so we should use the same prior distribution (the "informed prior" Beta distribution) for both. I think that way of looking at it is fine.

Dale said:
When couples have a unusual ratio we automatically suspect random chance may be skewing the results a bit, but do admit that there is some possibility that there is something different with this couple so that the results are not totally random.

But, more importantly, the posterior distribution is the same for both couples, since they both have the same data. The different choice of stopping criterion does not affect the posterior distribution. In terms of the way of looking at it that I described above, we are assuming that a couple's choice of stopping criterion is independent of any unknown factors that might affect their propensity for favoring one gender over the other in births.
 
  • #106
PeterDonis said:
But, more importantly, the posterior distribution is the same for both couples, since they both have the same data. The different choice of stopping criterion does not affect the posterior distribution
Yes, the stopping criterion does not affect our retrospective belief about that couple's ##\lambda##, provided we use the same prior for both couples. Theoretically there could be reasons to use different priors for the two couples, but for this scenario all such reasons seem pretty far-fetched.
 
  • #107
PeterDonis said:
But, more importantly, the posterior distribution is the same for both couples, since they both have the same data. The different choice of stopping criterion does not affect the posterior distribution. In terms of the way of looking at it that I described above, we are assuming that a couple's choice of stopping criterion is independent of any unknown factors that might affect their propensity for favoring one gender over the other in births.

After some calculations, I agree with this. If we assume that there are some couples who are more likely to have girls than boys, say, then the conditional probability that each couple is in that category, given the data, is the same in both cases.

It appears that in general the stopping criteria are indeed irrelevant.
 
  • #108
PeroK said:
It appears that in general the stopping criteria are indeed irrelevant.
They are irrelevant for determining the estimate of ##\lambda##, but not for determining the p-value, as you calculated somewhere back on the first page.
 
  • Like
Likes PeroK
  • #109
Dale said:
They are irrelevant for determining the estimate of ##\lambda##, but not for determining the p-value, as you calculated somewhere back on the first page.

I can patch that up! First, because of the asymmetry in the data, we should take the p-value as strictly more extreme than the data.

In case 1, we need the probability of either 7 boys or 7 girls. That's ##\frac{1}{64}##.

In case 2, I also misread the question and assumed they were waiting for a girl, rather than wanting at least one of each. The probability of having a family of more than 7 is ##\frac{1}{64}##.

The p-values match.

The mistake was that the exactly observed data was less likely in the second case, but because I was measuring numbers of boys against size of family, this created an asymmetry. There was no exact correspondence in what was observed. What I should really have calculated was the probability of getting up to six boys or girls against the probability of having a family size up to 7. I.e. the complement of strictly more extreme outcome, as above.

(This must be a general point to be aware of: if you can't match up the data exactly, you need to take the strictly more unlikely outcomes for the p-value.)

But, there has to be a twist! Suppose that the second family were, indeed, waiting for a girl. Now, the likelihood of a family of more than 7 is only ##\frac{1}{128}##. And, again there is a difference in p-values.

This may be a genuine case where the stopping criteria does make a difference (*).

(*) PS As Peter points out below, this is just a case of limiting it to a one-tailed scenario.
 
Last edited:
  • Informative
Likes Dale
  • #110
PeroK said:
we should take the p-value as strictly more extreme than the data.

"Strictly more extreme" is ambiguous, though. Does it mean "one-tailed" or "two-tailed"? In this case, does it mean "at least that many boys" or "at least that many children of the same gender"?

This doesn't affect whether the p-values are the same or not, but it does affect their actual numerical values. I'll assume the "one-tailed" case in what follows.

PeroK said:
The p-values match.

I don't think they do.

For couple #1, the sample space is all possible combinations of 7 children, and the "at least as extreme" ones are those that have at least 6 boys. All combinations are equally probable so we can just take the ratio of the total numbers of each. There are ##2^7## of the former and 8 of the latter (one with 7 boys and 7 with 6 boys), so the p-value is ##8 / 2^7 = 1/16##.

For couple #2, the sample space is all possible combinations of 2 or more children that have at least one of each gender; but the combinations are not all equally probable so we would have to take that into account if we wanted to compute the p-value using a ratio as we did for couple #1 above. However, an easier way is to just compute the probability of getting 6 boys in a row, which is just ##1 / 2^6 = 1/64##. This covers all combinations at least as extreme as the one observed--half of that 1/64 probability is for the combination actually observed (6 boys and 1 girl), and the other half covers all the other possibilities that are at least as extreme, since all of them are just some portion of the combinations that start with 7 boys. So the p-value is ##1/64##.

PeroK said:
there has to be a twist! Suppose that the second family were, indeed, waiting for a girl.

Since they started out having a boy as their first child, they are waiting for a girl. Or are you considering a case where the stopping criterion is simply "stop when the first girl is born"? For that case, the p-value would be the same as the one I computed above for couple #2; the difference is that the underlying sample space is now "all combinations that end with a girl", which means that if you tried to compute the p-value using ratios, as I did for couple #1 above, you would end up computing a different set of combinations and a different set of associated probabilities.

The other twist in this case is that there is no "two-tailed" case, since the stopping criterion is not symmetric between genders. So you could say that the p-value for this case is different from both of the ones I computed above if you converted the ones I computed above to the two-tailed case (which means multiplying by 2).

PeroK said:
This may be a genuine case where the stopping criteria does make a difference.

It can make a difference in p-value, yes, as shown above.

However, it still doesn't make a difference in the posterior distribution for ##\lambda##, or, in your terms, the conditional probability of each couple being in a particular category as far as a propensity for having boys or girls.
 
  • #111
Just for grins I also did a Monte Carlo simulation of the original problem. I assumed ##\lambda## starting at 0.01 and going to 0.99 in increments of 0.01. For each value of ##\lambda## I simulated 10000 couples using each stopping criterion. I then counted the number of couples that had 6 exactly boys. The plots of the counts are as follows. For the case where they stop after exactly 7 children regardless:
MonteCarlo7Total.png


For the case where they stop after they get one of each
MonteCarlo1ofEach.png


Notice that the shape is the same for both strategies, this is why the fact that we get the same data leads to the same estimate of ##\lambda##. However, note that the vertical scale is much different, this is why the probabilities are different for the two cases, it is simply much less likely to get 6 boys if trying for 1 of each than it is to get 6 boys if you simply have 7 children. This doesn't make the estimate any different, but it makes us more surprised to see the data.
 
  • Like
Likes Auto-Didact and PeroK
  • #112
PeterDonis said:
"Strictly more extreme" is ambiguous, though. Does it mean "one-tailed" or "two-tailed"? In this case, does it mean "at least that many boys" or "at least that many children of the same gender"?

This doesn't affect whether the p-values are the same or not, but it does affect their actual numerical values.

I assumed two-tailed.

You can see that

##p(7-0, 0-7) = p(n \ge 8)##

Where that's the total probablity of a unisex family of seven on the left and a family size of eight or more to get at least one of each sex. But:

##p(6-1, 1-6) \ne p(n =7)##

Which creates another interesting ambiguity. Is that genuinely a difference in p-values or just an asymmetry in the possible outcomes?
 
  • #113
PS if the p-values for two sets of data cannot be the same because of the discrete structure of the data, then having different p-values loses some of its significance!
 
  • #114
I did a few calculations for the cases of different sizes of families. There is a clear pattern. The "strict" p-value agrees in all cases. But, the "inclusive" p-value becomes more different as the size of the family increases. This is all two-tailed:

For a family of size ##N##, the strict p-value (the probability of the data being more extreme) is ##\frac{1}{2^{N-1}}## for both case 1 and case 2.

For the "inclusive" p-values (the data being as observed or more extreme), the p-values are:

##N = 4, \ p_1 = 5/8, \ p_2 = 1/4##
##N = 5, \ p_1 = 3/8, \ p_2 = 1/8##
##N = 6, \ p_1 = 7/32, \ p_2 = 1/16##
##N = 7, \ p_1 = 1/8, \ p_2 = 1/32##
##N = 8, \ p_1 = 9/128, \ p_2 = 1/64##

There's a clear pattern: ##p_2 = \frac{1}{2^{N-2}}## and ##p_1 = \frac{N+1}{2} p_2##.

This raises an interesting question about whether the p-value should be "strict" or "inclusive". In this problem, there is a case for choosing the strict version. Which reflects the fact that, after all, the data is the same.

Alternatively, the fact that the (inclusive) p-value in case 2 is lower for larger ##N## might be telling us something statistically significant.
 
  • Like
Likes Auto-Didact and Dale
  • #115
PeroK said:
This raises an interesting question about whether the p-value should be "strict" or "inclusive".

The "inclusive" p-value is different for case #1 vs. case #2 because the number of combinations that are equally extreme as the one actually observed is different for the two cases; whereas, in this particular case, the number of combinations which are more extreme happens to be the same for both case #1 and case #2. I don't think either of those generalizes well.

PeroK said:
the fact that the (inclusive) p-value in case 2 is lower for larger ##N## might be telling us something statistically significant

It's telling you that, as ##N## goes up, the number of combinations that are equally extreme as the one actually observed increases for case #1, whereas for case #2 it remains constant (it's always just 2 combinations, the one actually observed and its counterpart with boys and girls interchanged).

However, the more fundamental point is that, no matter how you slice and dice p-values, they are answers to a different question than the question I posed in this thread. They are answers to questions about how likely the observed data are given various hypotheses. But the question I posed is a question about how likely various hypotheses are given the observed data. In most real-world cases, the questions we are actually interested in are questions of the latter type, not the former. For those kinds of questions, the Bayesian viewpoint seems more appropriate.
 
  • Like
Likes Auto-Didact and Dale
  • #116
PeterDonis said:
Summary:: If we have two identical data sets that were generated by different processes, will their statistical weight as evidence for or against a hypothesis be different?

The specific example I'm going to give is from a discussion I am having elsewhere, but the question itself, as given in the thread title and summary, is a general one.

We have two couples, each of which has seven children that, in order, are six boys and one girl (i.e., the girl is the youngest of the seven). We ask the two couples how they came to have this set of children, and they give the following responses:

Couple #1 says that they decided in advance to have seven children, regardless of their genders (they think seven is a lucky number).

Couple #2 says that they decided in advance to have children until they had at least one of each gender (they didn't want a family with all boys or all girls).

Suppose we are trying to determine whether there is a bias towards boys, i.e., whether the probability p of having a boy is greater than 1/2. Given the information above, is the data from couple #2 stronger evidence in favor of such a bias than the (identical) data from couple #1?
Sorry if this was brought up already but isn't something similar done in medicine with likelihood ratios, using a database of priors and adjusting? Then you can decide , assuming equal priors I guess, if the likelihood ratio is the same in both cases?

EDIT: e.g., given symptoms A,B,C, etc. and a given age, there is a certain prior attached and then tests are given whose results have a likelihood ratio to them. Wonder if something similar can be made with your question, seeing if one has a higher likelihood ratio than the other?
 
Last edited:
  • Like
Likes Dale
  • #117
PeterDonis said:
the question I posed is a question about how likely various hypotheses are given the observed data. In most real-world cases, the questions we are actually interested in are questions of the latter type, not the former
That was actually the first thing that drew my attention and interest in Bayesian statistics. The outcome of Bayesian tests are more aligned with how I personally think of science and scientific questions. Plus, it naturally and quantitatively incorporates some philosophy of science in a non-philosophical way, specifically Popper’s falsifiability and Ockham’s razor.
 
  • Like
Likes jim mcnamara, Auto-Didact and WWGD
  • #118
Dale said:
That was actually the first thing that drew my attention and interest in Bayesian statistics. The outcome of Bayesian tests are more aligned with how I personally think of science and scientific questions. Plus, it naturally and quantitatively incorporates some philosophy of science in a non-philosophical way, specifically Popper’s falsifiability and Ockham’s razor.
Other than Bayes' theorem, does modern Probability, Math Statistics deal with Bayesian stats or just frequentist? EDIT: The type you would study in most grad courses that are not explicitly called frequentist which includes the CLT, LLN, etc.
 
  • #119
WWGD said:
isn't something similar done in medicine with likelihood ratios, using a database of priors and adjusting?

The results of medical tests for rare conditions are usually much better analyzed using Bayesian methods, yes, because those methods correctly take into account the rarity of the underlying condition, in relation to the accuracy of the test. Roughly speaking, if the condition you are testing for is rarer than a false positive on the test, any given positive result on the test is more likely to be a false positive than a true one. Frequentist methods don't give you the right tools for evaluating this.
 
  • Like
Likes Auto-Didact, Dale and WWGD
  • #120
WWGD said:
The type you would study in most grad courses that are not explicitly called frequentist
My classes were all purely frequentist, but I am an engineer that likes statistics rather than a statistician and also school was more than a decade ago. (Significantly more, even with a small sample)
 
Last edited:

Similar threads

  • · Replies 10 ·
Replies
10
Views
5K
  • · Replies 1 ·
Replies
1
Views
944
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
6K
Replies
6
Views
2K
Replies
5
Views
2K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K