I Does the statistical weight of data depend on the generating process?

Click For Summary
The discussion centers on whether the statistical weight of identical data sets, generated by different processes, affects the evidence for or against a hypothesis. Two couples with the same child gender outcomes provide contrasting motivations for their family planning, leading to different interpretations of the data's implications regarding gender bias. The analysis highlights that frequentist and Bayesian approaches yield different insights; frequentists focus on the likelihood of observing the data under a specific hypothesis, while Bayesians consider the data as fixed and the hypothesis as variable. The conversation emphasizes the importance of understanding the underlying processes that generate data, as they can significantly influence the conclusions drawn about probabilities. Ultimately, the distinction in experimental design and assumptions is crucial for accurate statistical interpretation.
  • #61
PeroK said:
PS in any case, I was only describing the difference between probability and confidence; not trying to analys the initial problem. See post #6.
ah ok. got it. I missed this.
PeroK said:
What's your opinion on post #6? I know you're the real expert on this!
I'm try to avoid the statistical estimation stuff right now... too perilous.

What I'd like to do with respect to original post is flush out the problem, apply a sufficient condition so we can use the Optional Stopping Theorem, and be done with it. But depending on what exactly is being asked, stopping rules either don't matter, or they matter a lot. (And if you have a defective stopping rule you can get into a lot of trouble without realizing it.)
 
Physics news on Phys.org
  • #62
StoneTemplePython said:
I wish Peter would restate the question in a clean probabilistic manner. Being a Frequentist or Bayesian has little do with the essence of the problem. The original post is really about stopping rules,

Yes, it is. One way of rephrasing the question is whether and under what circumstances changing the stopping rule makes a difference. In particular, in the case under discussion we have two identical data sets that were collected under different stopping rules; the question is whether the different stopping rules should affect how we estimate the probability of having a boy given the data.
 
  • #63
Dale said:
It seems to me like a valid calculation, it is just a calculation of a different probability than what you would calculate with Bayesian methods.

Yes, so another way of stating the question I asked in the OP is, which of these different probabilities is the one that is relevant for estimating ##\lambda## given the data? You seem to be saying it's yours, but @PeroK seems to be saying it's his. You can't both be right.
 
  • #64
StoneTemplePython said:
What I'd like to do with respect to original post is flush out the problem, apply a sufficient condition so we can use the Optional Stopping Theorem, and be done with it. But depending on what exactly is being asked, stopping rules either don't matter, or they matter a lot.

Can you give examples of each of the two possibilities you describe? I.e, can you give an example of a question, arising from the scenario described in the OP, for which stopping rules don't matter? And can you give an example of a question for which they matter a lot?
 
  • #65
StoneTemplePython said:
stopping rule: when a girl is born

This is not the correct stopping rule for couple #2. The correct stopping rule is "when there is at least one child of each gender". It just so happens that they had a boy first, so they went on until they had a girl. But if they had had a girl first, they would have gone on until they had a boy.
 
  • #66
PeterDonis said:
This might be a matter of differing terminology. In Jaynes' Probability Theory, for example, he describes processes like estimating a distribution for ##\lambda## as "parameter estimation". (He doesn't appear to like the term "random variable" much at all, and discusses some of the confusions that using it can cause.)
Yes, some authors are not clear on this point. But since it has a probability density function it is in fact what is commonly called a “random variable.”
 
  • #67
PeroK said:
I was assuming the idealised case where we have a single probability in all cases.

That's the case I would like to discuss in this thread. Other possibilities introduce further complications that I don't want to get into here.
 
  • #68
PeterDonis said:
Yes, so another way of stating the question I asked in the OP is, which of these different probabilities is the one that is relevant for estimating ##\lambda## given the data? You seem to be saying it's yours, but @PeroK seems to be saying it's his. You can't both be right.
How can you get an estimate of ##\lambda## by calculating ##p(X|\lambda=0.5)## at all? Even frequentist statistics don’t estimate ##\lambda## that way.
 
  • #69
Dale said:
How can you get an estimate of ##\lambda## by calculating ##p(X|\lambda=0.5)## at all? Even frequentist statistics don’t estimate ##\lambda## that way.

We're not estimating ##\lambda##, we're testing a hypothesis. If all the data we've ever seen is, say, ##BBBBBBG##, then no there is no way to "estimate" ##B## and ##G## as equally likely.
 
  • #70
Dale said:
Even frequentist statistics don’t estimate ##\lambda## that way.

@PeroK is saying that the second data set should make us less confident in the hypothesis that ##\lambda = 0.5## than the first data set, based on the p-value being lower. So frequentist statistics certainly seem to believe that ##p(X|\lambda = 0.5)## has some relevance.

"Estimating ##\lambda##" might not be the right way to express what I'm asking. Bayesian arguments such as you have made would seem to say that our confidence in the hypothesis that ##\lambda = 0.5## should be the same for both data sets, since the posterior distribution on ##\lambda## is the same. (More precisely, it's the same as long as the prior in both cases is the same. You gave an example of how the priors could be different; I'll respond to that in a separate post. For now, I'm focusing on the case where the priors are the same, since the p-values are still different for that case.) If that is the case, then the frequentist claim @PeroK is making is wrong.

OTOH, if the frequentist claim @PeroK is making is right, then there ought to be some way of reflecting the difference in the Bayesian calculation as well. But I can't come up with one.
 
  • #71
PeroK said:
We're not estimating ##\lambda##,
Why not? Since that is the specific question of interest that is exactly what we should do.
 
  • #72
Dale said:
If you had previous studies that showed, for example, that couples who decided on a fixed number of children in advance had different ##\lambda## than other couples.

For this case, I'm not sure exactly what frequentists would say. They might say that you would need to test the two cases against different hypotheses, so you can't really compare them at all.

I think this gets into complications that I said I didn't want to get into in this thread. As I noted in post #70, the case where the priors are the same still has different p-values for the two data sets, so it's enough to bring out the difference between the frequentist and Bayesian approaches.
 
  • #73
PeterDonis said:
I think this gets into complications that I said I didn't want to get into in this thread.
I agree. I certainly would assume equal priors, but in principle they could be unequal.
 
  • #74
Dale said:
Why not? Since that is the specific question of interest that is exactly what we should do.
If you gave me some data that read ##XXXXXXY## and you asked me to estimate the probability of getting ##X## or ##Y##, then (if forced to give an answer) I would say ##6/7## for ##X##.

But, that is not the case here. The question is about children being born, where we have a prior hypothesis that they are (approximately) equally likely. We are testing that hypothesis.
 
  • #75
PeroK said:
If you gave me some data that read ##XXXXXXY## and you asked me to estimate the probability of getting ##X## or ##Y##, then (if forced to give an answer) I would say ##6/7## for ##X##.
Yes. This is roughly the way that frequentist statistics would do it. I think the “official” process would be a maximum likelihood estimator, but that is probably close.
 
  • #76
PeterDonis said:
OTOH, if the frequentist claim @PeroK is making is right, then there ought to be some way of reflecting the difference in the Bayesian calculation as well. But I can't come up with one.
Well, the calculation that he is making is not an estimate of ##\lambda##. I think that the frequentist estimate of ##\lambda## would be the same for both couples. What would differ is the p value.

Since the p value isn’t part of Bayesian statistics the fact that it distinguishes between the two couples may not have a Bayesian analog. I am pretty sure that both Bayesian and frequentist methods would treat both couples identically for a point estimate of ##\lambda##.
 
  • #77
Dale said:
This is roughly the way that frequentist statistics would do it.

It is also the way that Bayesian statistics would do it, is it not, in the (extreme) case @PeroK describes where there is literally no prior data? In that case, a Bayesian would use a maximum entropy prior, which basically means that your posterior after the first set of data is whatever the distribution of that data set is.
 
  • #78
Dale said:
I think that the frequentist estimate of ##\lambda## would be the same for both couples. What would differ is the p value.

But the p-value affects our confidence level in the estimate, correct? So the confidence levels would be different for the two couples.

Dale said:
Since the p value isn’t part of Bayesian statistics the fact that it distinguishes between the two couples may not have a Bayesian analog.

If it is correct that our confidence level in the estimate should be different for the two couples, I would certainly expect there to be some way to reflect that in a Bayesian calculation.
 
  • #79
Dale said:
the calculation that he is making is not an estimate of ##\lambda##.

Again, "estimate ##\lambda##" might not be the right way to express what I was asking in the OP. I did not intend the OP to be interpreted narrowly, but broadly.

Perhaps a better way to broadly express the OP question would be: there is obviously a difference between the two couples, namely, that they used different processes in their child-bearing process. Given that the two data sets they produced are the same, are there any other differences that arise from the difference in their processes, and if so, what are they? (We are assuming, as I have said, that there are no other differences between the couples themselves--in particular, we are assuming that ##\lambda## is the same for both.)

So far I have only one difference that has been described: the p-values are different. Are there others? And what, if any, other implications does the difference in p-values have? Does it mean we should have different posterior beliefs about ##\lambda##?
 
  • #80
PeterDonis said:
In that case, a Bayesian would use a maximum entropy prior, which basically means that your posterior after the first set of data is whatever the distribution of that data set is.
Most treatments of this type of problem that I have seen would use a Beta distribution since it is a conjugate prior. So you would get ##\lambda \sim Beta(2,7)## for the posterior for both cases separately or ##\lambda \sim Beta(4,14)## if you were pooling the data for an overall estimate.

https://www.physicsforums.com/threa...from-bayesian-statistics.973377/#post-6193429
 
  • #81
PeterDonis said:
But the p-value affects our confidence level in the estimate, correct? So the confidence levels would be different for the two couples.
Frequentist confidence intervals will be different between the two couples, and Bayesian credible intervals will be different from either of those. But as far as I know Bayesian credible intervals will be the same for both couples. That is precisely the advantage of Bayesian methods highlighted in the paper I cited earlier. This is, in fact, a fundamental difference between the methods.
PeterDonis said:
Again, "estimate ##\lambda##" might not be the right way to express what I was asking in the OP. I did not intend the OP to be interpreted narrowly, but broadly
Well, the narrow question is clear and can be answered. I am not sure that the broad question is sufficiently well defined to be answerable.
 
  • #82
Dale said:
That is precisely the advantage of Bayesian methods highlighted in the paper I cited earlier.

Why is it an advantage? Why are Bayesian credible intervals right and frequentist confidence intervals wrong?
 
  • #83
PeterDonis said:
So far I have only one difference that has been described: the p-values are different. Are there others? And what, if any, other implications does the difference in p-values have? Does it mean we should have different posterior beliefs about λλ\lambda?
I do not think that the fact that there are different p-values does or should mean that our posteriors should be different.
 
  • #84
Dale said:
I do not think that the fact that there are different p-values does or should mean that our posteriors should be different.

Why not? (This is basically the same question I asked in post #82.)
 
  • #85
PeterDonis said:
Why is it an advantage? Why are Bayesian credible intervals right and frequentist confidence intervals wrong?
(this is not really on topic for the thread, but you asked and it is a topic that I am somewhat passionate about, so ...)

It isn’t about right or wrong. It is about economics and professional ethics.

Because p-values depend on your intentions if you take previously studied data and run more tests on that data then you alter the previously reported p-values. Such analyses reduce the significance of previous results. This means that, in principle, you can always make any result non-significant simply by intending to study the data more.

The result of this statistical fact is that scientists need to avoid analyzing previously reported data. In some fields using previously reported data is considered grounds for rejecting a paper. This basically makes scientific data “disposable”, you use it once and then throw it away.

There is no need to treat data this way any more. This “disposable-ness” is not inherent to data nor to science, it is purely a result of the widely used frequentist statistical tools.

Frankly, for publicly funded research this is a travesty. The taxpayers payed good money to purchase that data and scientists use it once and then throw the data into the trash simply because they have not informed themselves about Bayesian statistics. If they had informed themselves then future researchers could reuse the data, making the tax money go further.

It seems like the ethically responsible way to handle the public treasury is to study any collected data as thoroughly as possible, but this intention makes any frequentist test non significant. That is why this specific feature of Bayesian statistics is an advantage.

You will notice that very large collaborations with very expensive data are turning more and more to Bayesian methods. So I think there is a growing awareness of this issue.
 
Last edited:
  • Like
Likes PeterDonis, Jimster41, Auto-Didact and 1 other person
  • #86
PeterDonis said:
Again, "estimate ##\lambda##" might not be the right way to express what I was asking in the OP. I did not intend the OP to be interpreted narrowly, but broadly.

Perhaps a better way to broadly express the OP question would be: there is obviously a difference between the two couples, namely, that they used different processes in their child-bearing process. Given that the two data sets they produced are the same, are there any other differences that arise from the difference in their processes, and if so, what are they? (We are assuming, as I have said, that there are no other differences between the couples themselves--in particular, we are assuming that ##\lambda## is the same for both.)

So far I have only one difference that has been described: the p-values are different. Are there others? And what, if any, other implications does the difference in p-values have? Does it mean we should have different posterior beliefs about ##\lambda##?

This probably only makes sense if we allow a second parameter - for example that some couples have a predisposition for children of the one sex. Otherwise, there no reason to doubt the general case.

Unless we allow the second parameter, all we are doing is picking up unlikely events. We can calculate the probability of these events, but unless we allow the second parameter, that is all we can say.

My calculations show that the second family is less likely (more of an anomaly) than the first, but this has no effect on the overall average. Assuming we have enough prior data. Which we do.

What this data does question is the hypothesis that no couples have a predispostion to the one sex or other of their children.

In other words, if a family has ten children, all girls say; then, I don't think this influences the overall mean for girls in general. In fact, even if you adjusted the mean to ##0.6## (which still leaves 10 girls in a row very unlikely), you've created the hypothesis that 60% of children should be girls. Which is absurd. You can't shift the mean from ##0.5## (or whatever it is - I believe it's not quite that) on the basis of one family.

What it does is raise the question about a predisposition to girls in that family. In the extreme case of, say, 50 girls in a row, then

1) That does not affect the overall mean to any extent.

2) It implies that it is almost certain that the data itself could not have come from the assumed distribution. I.e. that family is not producing children on a 50-50 basis.

In summary, to make this a meaningful problem I think you have to add another parameter. Then it reduces to the standard problem where you count the false positives (couples who do produce children 50-50, but who happen to have a lot of one sex) and count the true positives (couples who are genetically more likely to have one sex). Then, you can calculate ##p(A|B)## and ##p(B|A)## etc. (*)

As it stands, to clarify all my posts hitherto, all we can do is calculate how unlikely each of these families is under the hypothesis that in general ##\lambda = 0.5##. Nothing more. Confidence interval calculations cannot be done because of the assumed overwhelming prior data.

(*) PS although we still have to be aware of the sampling pitfalls.

PPS Maybe the Bayesians can do better.
 
  • #87
Dale said:
p-values depend on your intentions

This might be an issue in general, but it is not in the particular scenario we are talking about here. The p value depends on the process used to generate the data, but that process is an objective fact about each couple; it is not a matter of the intentions of third parties studying the data.
 
  • #88
PeterDonis said:
This might be an issue in general, but it is not in the particular scenario we are talking about here.
Yes, in fact it is the key issue. The only difference between the couples was their intentions. Frequentist methods are sensitive to the intentions of the experimenters as well as the analysts. Did you read the paper? It covers both.
 
  • #89
PeroK said:
This probably only makes sense if we allow a second parameter - for example that some couples have a predisposition for children of the one sex. Otherwise, there no reason to doubt the general case.

What is "the general case"? We are assuming for this discussion that there is no second parameter--p is the same for all couples.

If by "the general case" you mean ##p = 0.5## (or ##\lambda = 0.5## in @Dale's notation), then the actual evidence is that this is false; the global data seems to show a value of around ##0.525## to ##0.53##.

https://en.wikipedia.org/wiki/Human_sex_ratio

PeroK said:
What this data does question is the hypothesis that no couples have a predispostion to the one sex or other of their children.

Yes, but does it question it to a different extent for couple #2 vs. couple #1? Does their different choice of process make a difference here?
 
  • #90
PeroK said:
all we can do is calculate how unlikely each of these families is under the hypothesis that in general ##\lambda = 0.5##. Nothing more

This seems way too pessimistic. We can calculate probabilities and p-values and likelihood ratios for any value of ##\lambda## we like. The math might be more difficult, but that's what computers are for. :wink:
 

Similar threads

  • · Replies 10 ·
Replies
10
Views
5K
  • · Replies 1 ·
Replies
1
Views
945
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
6K
Replies
6
Views
2K
Replies
5
Views
2K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K