I Does the statistical weight of data depend on the generating process?

Click For Summary
The discussion centers on whether the statistical weight of identical data sets, generated by different processes, affects the evidence for or against a hypothesis. Two couples with the same child gender outcomes provide contrasting motivations for their family planning, leading to different interpretations of the data's implications regarding gender bias. The analysis highlights that frequentist and Bayesian approaches yield different insights; frequentists focus on the likelihood of observing the data under a specific hypothesis, while Bayesians consider the data as fixed and the hypothesis as variable. The conversation emphasizes the importance of understanding the underlying processes that generate data, as they can significantly influence the conclusions drawn about probabilities. Ultimately, the distinction in experimental design and assumptions is crucial for accurate statistical interpretation.
  • #31
Dale said:
It is a fundamentally different approach. In the frequentist approach the hypothesis (usually p=0.5) is taken to be certain and the data is considered to be a random variable from some sample space. That is the issue, the two sample spaces are different. For the Bayesian approach the data is considered certain and the hypothesis is a random variable.
Sorry for my stubbornness, but I have difficulties to figure out the difference.

Let's say I test a coin and the null hypothesis is ##p=0.5##. Is it true that in the frequentists' model if I flip the coin in many different test with different setups, I only measure how reliable my data are under the assumption of an ideal coin, whereas in the Bayesian model, I measure the bias of my coin under the assumption that my data will tell me?

Seems a bit linguistic to me.
 
Physics news on Phys.org
  • #32
Dale said:
your prior could conceivably be different

How might the prior for couple #2 be different from the prior for couple #1?
 
  • #33
fresh_42 said:
have difficulties to figure out the difference

See the question I asked @PeroK in the last part of post #30.
 
  • #34
PeterDonis said:
The only relevant difference between the two couples is the process they used. What would your answer be then?
Given a reasonable sample size, we shouldn't be able to tell a difference. However, I don't think such an ideal case can be realized. Boy to girl is p to (1-p) regardless of the measurement. In reality this is not the case IMO.
 
  • #35
fresh_42 said:
Given a reasonable sample size, we shouldn't be able to tell a difference.

I already specified what the two samples are: the (identical) data from couples #1 and #2. So are you saying that, if the only difference between the couples is the process they used, the two data sets have the same statistical weight when estimating ##p##?

fresh_42 said:
I don't think such an ideal case can be realized.

I agree--no two couples are ever exactly the same except for just the process they used--but idealized cases are often useful for investigating questions even when they can't be realized.
 
  • #36
PeterDonis said:
Yes, but that's not the question I asked. The question I asked was whether ##\lambda = 0.5## is less likely given the second case vs. the first.
How does "the second data set is less likely given the hypothesis that ##\lambda = 0.5##" get transformed to
"the hypothesis that ##\lambda = 0.5## is less likely given the second data set"? That is not a valid deductive syllogism; in fact it's a common error people make (assuming that if A then B is equivalent to if B then A).

I'm working to standard hypothesis testing. In particular, there is a single, unknown value ##\lambda##. It's not a random variable.

We can test ##\lambda = 0.5## (or any other value) against a random data set ##X## and compute ##p(X|\lambda)## for that data set.

The data in case #2 is less likely, given the hypothesis ##\lambda = 0.5##.

Eventually, with enough data, we would have to abandon the hypothesis ##\lambda = 0.5##. That is a thornier issue. In reality, it is more about an accumulation of data than one test.

Here the data in case #2 gives us less confidence in our hypothesis. That is the sense in which ##\lambda = 0.5## is "less likely".
 
  • #37
PeroK said:
Here the data in case #2 gives us less confidence in our hypothesis.

Why? As I've already said, there is no valid deductive reasoning that gets you from "the second data set is less likely given the hypothesis that ##\lambda = 0.5##" to "the hypothesis that ##\lambda = 0.5## is less likely given the second data set". So since you can't be using valid deductive reasoning, what reasoning are you using?

PeroK said:
I'm working to standard hypothesis testing.

I'm not sure that standard hypothesis testing (aka frequentist statistics) has a good answer to the question I just posed above. But if there is one, I would like to know it.
 
  • #38
PeterDonis said:
See the question I asked @PeroK in the last part of post #30.
If this is the difference between the two, then the Bayesian model doesn't make much sense to me for real life situations. You cannot setup different experiments such that the outcome only depends on the random variable.
 
  • #39
fresh_42 said:
You cannot setup different experiments such that the outcome only depends on the random variable.

I don't see how this is relevant. The two cases don't differ in their outcomes; the outcomes are the same. They only differ in the process used to generate the outcomes, and that process, itself, does not depend on the variable (p, or ##\lambda## in @Dale's notation) whose value we are trying to estimate.
 
  • #40
PeterDonis said:
I already specified what the two samples are: the (identical) data from couples #1 and #2. So are you saying that, if the only difference between the couples is the process they used, the two data sets have the same statistical weight when estimating ##p##?
I don't see how we can estimate anything from two tests. With sample size I meant enough tests of either setups. If we measure an effect a million times at CERN and a thousand times at Fermi, and have the same results, why should there be a different significance? The million tops the thousands, but given the identical outcome, I don't see a different weight.
 
  • #41
fresh_42 said:
given the identical outcome, I don't see a different weight.

Ok.
 
  • #42
PeterDonis said:
I don't see how this is relevant.
I think there is a major difference between theory and real life. Given the same outcome, we cannot decide which experiment is closer to the real distribution. The quality of the processes cannot be distinguished. I just say that there are always unkowns which don't find their way into the calculation. Such as the father's age in the first example.
 
  • #43
fresh_42 said:
Given the same outcome, we cannot decide which experiment is closer to the real distribution.

Again, I'm confused by this, because the two different "experiments" (the different processes the couples are using) have nothing to do with the distribution. They have nothing to do with what the value of ##\lambda## is. So asking "which experiment is closer to the real distribution" seems like nonsense to me.
 
  • #44
PeterDonis said:
I'm not sure that standard hypothesis testing (aka frequentist statistics) has a good answer to the question I just posed above. But if there is one, I would like to know it.

I wouldn't discount it quite so readily. Let's follow your line of logic through. Suppose you did a large survey of births in the USA in the last year. You want to measure the probability that a boy is born, as opposed to a girl. Call this ##\lambda##. What you cannot do is give a probability distribution for ##\lambda##. Something like:

##p(\lambda = 0.47) = 0.05##
##p(\lambda = 0.48) = 0.10##
##p(\lambda = 0.49) = 0.20##
##p(\lambda = 0.50) = 0.30##
##p(\lambda = 0.51) = 0.20##
##p(\lambda = 0.52) = 0.10##
##p(\lambda = 0.53) = 0.05##

That is not valid because ##\lambda## was not a random variable in the data you analysed.

Instead, you can say some thing like:

##\lambda## is in the range ##0.47 - 0.52## with ##99\%## confidence.
##\lambda## is in the range ##0.48 - 0.51## with ##90\%## confidence.
##\lambda## is in the range ##0.49 - 0.50## with ##80\%## confidence.

That's the difference between "confidence" and "probabilities". Parameters associated with a distribution have confidence levels, not probabilities. The random data has probabilities.
 
Last edited:
  • Like
Likes Jimster41
  • #45
with a single sample in either trial the ex post odds are the same - one success in seven trials. continuing with the coin flipping analogy, if you had additional samples, the distribution would differ - one sample set would be of the number of heads in seven coin flips and the other the number of flips before the first head appeared.

the boy/girl example is confusing because it’s not clear whether the problem assumes an equal p=boy between the two couples, which biologically would not be true, or is attempting to measure p=boy for each couple separately, which, while biologically realistic, precludes any additional information from further samples, or to use the two couples to estimate the p=boy for the overall population, in which case one can simply disregard the two couples as outliers
 
  • #46
PeroK said:
What you cannot do is give a probability distribution for ##\lambda##. Something like:

##p(\lambda = 0.47) = 0.05##
##p(\lambda = 0.48) = 0.10##
##p(\lambda = 0.49) = 0.20##
##p(\lambda = 0.50) = 0.30##
##p(\lambda = 0.51) = 0.20##
##p(\lambda = 0.52) = 0.10##
##p(\lambda = 0.53) = 0.10##

That is not valid because ##\lambda## was not a random variable in the data you analysed.
That is exactly what Bayesian statistics do. They do treat ##\lambda## as a random variable and determine its probability distribution.
 
  • #47
BWV said:
t’s not clear whether the problem assumes an equal p=boy between the two couples

In my discussion with @fresh_42 I clarified that I intended to include this assumption, yes. I agree, as I said in that discussion, that the assumption is an idealization.

We could go into how one would analyze the data if that assumption were dropped, but that's a further complication that I don't really want to get into in this thread.
 
  • #48
PeterDonis said:
Again, I'm confused by this, because the two different "experiments" (the different processes the couples are using) have nothing to do with the distribution. They have nothing to do with what the value of ##\lambda## is. So asking "which experiment is closer to the real distribution" seems like nonsense to me.
I believe that each real life test has different random variables and different conditional probabilities and thus different distributions. The assumption that they are the same is already a hypothesis. One I would work with as long as the outcomes remain stable. This adds up to the confidence into the hypothesis. If you mean confidence by statistical weight, then the number of tests and the setup does play a role.
 
  • #49
Dale said:
That is exactly what Bayesian statistics do. They do treat ##\lambda## as a random variable and determine its probability distribution.

This might be a matter of differing terminology. In Jaynes' Probability Theory, for example, he describes processes like estimating a distribution for ##\lambda## as "parameter estimation". (He doesn't appear to like the term "random variable" much at all, and discusses some of the confusions that using it can cause.)
 
  • #50
Dale said:
That is exactly what Bayesian statistics do. They do treat ##\lambda## as a random variable and determine its probability distribution.

What does a Bayesian analysis give numerically for the data in post #1?
 
  • #51
PeterDonis said:
How might the prior for couple #2 be different from the prior for couple #1?
If you had previous studies that showed, for example, that couples who decided on a fixed number of children in advance had different ##\lambda## than other couples.
 
  • #52
PeroK said:
I wouldn't discount it quite so readily. Let's follow your line of logic through. Suppose you did a large survey of births in the USA in the last year. You want to measure the probability that a boy is born, as opposed to a girl. Call this ##\lambda##. What you cannot do is give a probability distribution for ##\lambda##...
this appears to be falling victim to the Inspection Paradox. Whether you sample based on children or parents matters. Original post discussed sampling by Parents (I think) and you are now sampling by children.

- - - -
I wish Peter would restate the question in a clean probabilistic manner. Being a Frequentist or Bayesian has little do with the essence of the problem. The original post is really about stopping rules, something pioneered by Wald (who, yes did some bayesian stats too). And yes subsequent to Wald, stopping rules were extended in a big way by Doob via Martingales.
 
  • Like
Likes Jimster41
  • #53
I vaguely remember similar discussions at my institute. I like Hendrik's approach in QFT: sit down and calculate. Interpretations are another game.
 
  • #54
StoneTemplePython said:
this appears to be falling victim to the Inspection Paradox. Whether you sample based on children or parents matters. Original post discussed sampling by Parents (I think) and you are now sampling by children.

Are you talking about the case where some parents have a genetic disposition to one sex for their children?

I was assuming the idealised case where we have a single probability in all cases.
 
  • #55
fresh_42 said:
Seems a bit linguistic to me.
In general the difference between ##p(X|\lambda)## and ##p(\lambda|X)## is not merely linguistic. They are different numbers. In addition there is the difference in the space over which the probabilities are measured. One is a measure over the space of all possible experimental outcomes ##X## and the other is a measure over the space of all possible boy-birth probabilities ##\lambda##
 
  • #56
StoneTemplePython said:
this appears to be falling victim to the Inspection Paradox. Whether you sample based on children or parents matters. Original post discussed sampling by Parents (I think) and you are now sampling by children.

PS in any case, I was only describing the difference between probability and confidence; not trying to analys the initial problem. See post #6.
 
  • #58
PeroK said:
Are you talking about the case where some parents have a genetic disposition to one sex for their children?

I was assuming the idealised case where we have a single probability in all cases.

My read on original post was a question with two 'types' (or iid representatives for classes) of families. One having n kids (stopping rule: n, so random variable = n, with probability one for our purposes) and the other has a geometrically distributed random variable for number of kids (stopping rule: when a girl is born).

The underlying idea of how you sample is closely related to what Dale is saying -- but the way people get tripped up... happens so often it goes under the name of "Inspection Paradox" (originally a renewal theory idea, but pretty general)... we need to be very careful on whether we are doing our estimates by sampling kids or sampling the parents/couples
 
  • #59
StoneTemplePython said:
My read on original post was a question with two 'types' of families. One having n kids (stopping rule: n, so random variable = n, with probability one for our purposes) and the other has a geometrically distributed random variable for number of kids (stopping rule: when a girl is born).

The underlying idea of how you sample is closely related to what Dale is saying -- but the way people get tripped up... happens so often it goes under the name of "Inspection Paradox" (originally a renewal theory idea, but pretty general)... we need to be very careful on whether we are doing our estimates by sampling kids or sampling the parents/couples

What's your opinion on post #6? I know you're the real expert on this!
 
  • #60
PeroK said:
What's your opinion on post #6? I know you're the real expert on this!
I worry that you think I was criticizing your calculation in #6. I am not. It seems to me like a valid calculation, it is just a calculation of a different probability than what you would calculate with Bayesian methods. Nothing wrong with that, just different.
 

Similar threads

  • · Replies 10 ·
Replies
10
Views
5K
  • · Replies 1 ·
Replies
1
Views
945
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
6K
Replies
6
Views
2K
Replies
5
Views
2K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K