I Does the statistical weight of data depend on the generating process?

Click For Summary
The discussion centers on whether the statistical weight of identical data sets, generated by different processes, affects the evidence for or against a hypothesis. Two couples with the same child gender outcomes provide contrasting motivations for their family planning, leading to different interpretations of the data's implications regarding gender bias. The analysis highlights that frequentist and Bayesian approaches yield different insights; frequentists focus on the likelihood of observing the data under a specific hypothesis, while Bayesians consider the data as fixed and the hypothesis as variable. The conversation emphasizes the importance of understanding the underlying processes that generate data, as they can significantly influence the conclusions drawn about probabilities. Ultimately, the distinction in experimental design and assumptions is crucial for accurate statistical interpretation.
  • #121
PeterDonis said:
The results of medical tests for rare conditions are usually much better analyzed using Bayesian methods, yes, because those methods correctly take into account the rarity of the underlying condition, in relation to the accuracy of the test. Roughly speaking, if the condition you are testing for is rarer than a false positive on the test, any given positive result on the test is more likely to be a false positive than a true one. Frequentist methods don't give you the right tools for evaluating this.

Peter, you are fairly harsh in the physics forums when nonsense is posted, so there is no reason not to point out that this is nonsense. The vast majority of medical research has used standard statistical analysis, which is based on frequentist methods.

If what you say were true there would have been a mass conversion to Bayesian methods.

I'd like to see a statistical journal where your claims about standard statistical methods being inadequate simply because a test can yield more false positives than true positives is substantiated.
 
Physics news on Phys.org
  • #122
WWGD said:
Sorry if this was brought up already but isn't something similar done in medicine with likelihood ratios, using a database of priors and adjusting? Then you can decide , assuming equal priors I guess, if the likelihood ratio is the same in both cases?
Yes, this is becoming more and more standard practice in medicine. There are not only journals but even undergraduate medical textbooks which directly address such issues as part of the core clinical theory of medicine. This has been this way for at least 20 years and is steadily developing.

However, from my experience of polling undergraduates and graduates, the emphasis on the utility of Bayesian methods is so marginal - both educationally and clinically - that it is practically forgotten by the time rounds begin; older physicians that are not in academia and/or not educators tend to be wholly unfamiliar with these relatively novel methods, so they straight out ignore them.
PeroK said:
Peter, you are fairly harsh in the physics forums when nonsense is posted, so there is no reason not to point out that this is nonsense. The vast majority of medical research has used standard statistical analysis, which is based on frequentist methods.

If what you say were true there would have been a mass conversion to Bayesian methods.

I'd like to see a statistical journal where your claims about standard statistical methods being inadequate simply because a test can yield more false positives than true positives is substantiated.
In medicine, frequentist statistics is only utilized for academic research i.e. generalizing from single instances to entire populations, while Bayesian statistics is used in clinical practice, i.e. specifying from generalities to particular cases. Medicine as clinical practice is purely concerned with the latter, which is why quantitative operationalizations of certain aspects of the medical process such as likelihood ratio analyses have been invented; such purely clinical quantitative methods tend to be Bayesian, i.e. the clinical application of knowledge gained using frequentist statistical methods is Bayesian.

While I get your sentiment you are simply wrong here and your misunderstanding is a widespread one in medicine as well. Moreover, you have misconstrued the actual issue by not qualifying your statement, i.e. the vast majority of medical research focused on comparing treatments and demonstrating effectiveness of treatment have focused on standard statistical analysis. To use the actual terminology, most medical research is quantitative research.

This terminology is extremely misleading because it pretends that standard statistical analysis is the only kind of quantitative research - something which some medical researchers will actually tell you! - which is obviously wrong! See e.g. the difference in mathematical sophistication and background required between 'quantitative finance' and 'finance'; in fact, recognizing this early on is what made me realize I had to take a degree in either applied mathematics or physics in order to learn alternative quantitative and mathematical methods for research in medicine which are completely unknown in medicine.

In any case, the fact that most research in medicine has focused only on the type of question 'does A work/is A better than B' is because practically these are the easiest types of questions to research and answer with little to no uncertainty: in fact, the path is so completely straightforward such that with statistical packages already available all that is practically left to do is just collect data and correctly feed it into the computer. This has transformed both the standard MD/PhD programme as well as the typical PhD programme in medicine into a very straightforward path which can be reduced to mastering standard statistical analysis, but I digress.

Apart from the obviously different kinds of research which require different methods - e.g. laboratory work and sociological analysis - there are of course also other types of quantitative questions that are of direct interest in medicine, both in the scientific as well as the clinical context. The problem for medicine with such quantitative questions is that they do not fit the existing mold i.e. they require alternative quantitative methods that simply aren't taught in the standard medical curriculum; Bayesian likelihood ratio analysis is an exception that is taught.

It is generally recognized by clinicians that alternative quantitative methods however are to some extent taught in other sciences. Because of this many of these alternative quantitative questions are simply directly deferred to other sciences (biomedical sciences, pharmacology, physiology and so on). The problem then remains that the purely clinical questions cannot be deferred to other sciences because they are purely practical medical issues and belong to the domain of the clinical physician. How do clinicians deal with this? They simply ignore it and/or leave it as an issue for the next generation to solve.
 
  • #123
Auto-Didact said:
While I get your sentiment you are simply wrong here and your misunderstanding is a widespread one in medicine as well.

Okay, I'm willing to believe this. But, I would like to see some evidence.

I can see the potential for the Bayesian approach. What I don't see is how the standard approach can ultimately fail in general.

Why has everyone (who uses standard statistical analysis) been wrong all along how many people know this?
 
  • #124
PeroK said:
Okay, I'm willing to believe this. But, I would like to see some evidence.

I can see the potential for the Bayesian approach. What I don't see is how the standard approach can ultimately fail in general.

Why has everyone (who uses standard statistical analysis) been wrong all along how many people know this?
I've been trying to answer this for over a decade now. If you could answer that convincingly, you'd probably get the Nobel Prize in Medicine.
 
  • #125
Auto-Didact said:
I've been trying to answer this for over a decade now. If you could answer that convincingly, you'd probably get the Nobel Prize in Medicine.

Well, I'm not after a Nobel Prize. As far as I can see, it's the traditional camp that is concerned about the reliability of Bayesian methods. Not the other way round.
 
  • #126
Deciding when to stop data collection is an important part of an experimental design to prevent the introduction of bias. My preference is to design experiments from the outset that stop either with a fixed, pre-determined number of data points, or run for a fixed, pre-determined duration of time. It is hard to introduce a human decision to stop data collection once it has begun that is free of bias, especially if the human decision maker(s) are aware of the results so far.
 
  • Like
Likes jim mcnamara, Jimster41, Auto-Didact and 1 other person
  • #127
PeroK said:
Well, I'm not after a Nobel Prize. As far as I can see, it's the traditional camp that is concerned about the reliability of Bayesian methods. Not the other way round.
You're of course correct. Apart from the Nobel Prize it is likely that a solution would go a long ways to solving the reproduction crisis and problem with p-value hacking, as these all seem to be symptoms of the same disease, which is precisely why solving it is Prize worthy in the first place.

I actually have an explanation, but the question is whether or not that explanation is going to be convincing to the traditional camp. In summary, medicine is an extremely traditional discipline: an unspoken principle is 'don't fix what ain't broken'. If one doesn't conform to the traditions of medicine, one is quickly ostracized and cast out; this almost instantly applies once one suggests going beyond the traditional boundaries. If one has to go against the foundational traditions of the medical establishment to prove their point - even if one can demonstrate that what they are doing is in fact correct - this is simply not a path that many people are willing to take.

Notice the striking resemblance between this issue and the arguments regarding the problems in the foundations of QM, which is also split into two camps: those who take the issues seriously as unjustifiable loose ends in physics - i.e. foundationalists - and those arguing that those problems aren't actually real problems and can just be straightforwardly ignored for whatever instrumental or practical reasons, such as personal convenience - i.e. pragmatists.
 
  • #128
Dr. Courtney said:
Deciding when to stop data collection is an important part of an experimental design to prevent the introduction of bias. My preference is to design experiments from the outset that stop either with a fixed, pre-determined number of data points, or run for a fixed, pre-determined duration of time. It is hard to introduce a human decision to stop data collection once it has begun that is free of bias, especially if the human decision maker(s) are aware of the results so far.
This sounds like the conventional methodology to decide necessary sample sizes a priori based on power analysis used in standard statistical clinical research.

On the other hand, in the practice of clinical medicine among experienced practitioners we have a non-explanatory term for limiting data collection only to the bare minimum necessary in order to make a clinical decision: correct practice. To contrast, collecting data which cannot directly be considered to be relevant for the problem at hand is seen as 'incorrect practice'.

Engaging in incorrect practice too frequently, either deliberately or by mistake, is a punishable offense; I reckon implementing something like this would be effective as well to deter such behavior in scientific practice.
 
  • #129
PeterDonis said:
Suppose we are trying to determine whether there is a bias towards boys, i.e., whether the probability p of having a boy is greater than 1/2. Given the information above, is the data from couple #2 stronger evidence in favor of such a bias than the (identical) data from couple #1?

To get a mathematical answer, we would have to define what "evidence" for p > 1/2 means and what procedure will used to determine that evidence_A is stonger than evidence_B.

In frequentist statistics, the common language notion of "strength of evidence" suggests comparing "power curves" for statistical tests. To do that, you must pick a particular statistics and define the rejection region for each test. (The number of boys in the data is but one example of a statistic that can defined as a function of the data.)

In Bayesian statistics, one can compute the probability that p > 1/2 given a prior distribution for p and the data. Suppose the two experiments A and B produce respective data sets ##D_A## and ##D_B##. For particular data sets, it might turn out that ##Pr(p>1/2 | D_A) > Pr(p> 1/2| D_B)##. However, for different particular data sets, the inequality might be reversed. So how shall we phrase your question in order to consider in general whether experiment A or experiment B provides more evidence?

I suppose one way is consider the expected value for ##Pr(p > 1/2 | D)## where the expectation is taken over the joint distribution of possible data sets and values of ##p## - do this for each experiment and compare answers. This is a suspicious procedure from the viewpoint of experimental design. It seems to be asking "Which experiment should I pick to give the strongest evidence that p > 1/2?". However, that seems to be the content of your question.

From the point of view of experimental design, a nobler question is "Which experiment gives a better estimate of p?". To translate that into mathematics requires defining what estimators will be used.
 
  • Like
Likes Auto-Didact
  • #130
PeroK said:
Okay, I'm willing to believe this. But, I would like to see some evidence.

I can see the potential for the Bayesian approach. What I don't see is how the standard approach can ultimately fail in general.

Why has everyone (who uses standard statistical analysis) been wrong all along how many people know this?
Coincidentally, Sabine Hossenfelder just uploaded a video which gives a (simplified) explanation of an aspect of this same topic, which applies to all the sciences more broadly instead of just w.r.t. how statistical methodology is used by scientists in medicine:

An important general lesson to take away from the video is that biases which have not been quantified - perhaps simply because the type of bias was discovered after statistical methodology - are often ignored by scientists; this also weakens the efficacy of statistical analysis, regardless of how careful the scientists were.
 
  • #131
PeroK said:
The vast majority of medical research has used standard statistical analysis, which is based on frequentist methods.

Yes, and much of that medical research fails to be replicated. The "replication crisis" that was making headlines some time back was not limited to medical research, but it included medical research. One of the key criticisms of research that failed to be replicated, on investigation, was inappropriate use of p-values. That criticism was basically saying the same thing that @Dale and I are saying in this thread: the p-value is the answer to a different question than the question you actually want the answer to.

PeroK said:
standard statistical methods being inadequate simply because a test can yield more false positives than true positives

My point was that the p-value, which is the standard statistical method for hypothesis testing, can't answer this question for you. The p-value tells you the probability that the positive test result would have happened by chance, if you don't have the disease. But the probability you are interested in is the probability that you have the disease, given the positive test result. It's easy to find actual tests and actual rare conditions where the p-value after a positive test result can be well below the 5% "significance" threshold, which under standard statistical methods means you reject the null hypothesis (i.e., you tell the patient they most likely have the disease), but the actual chance that the patient has the disease given a positive test result is small.
 
  • Like
Likes Auto-Didact
  • #133
PeroK said:
Peter, you are fairly harsh in the physics forums when nonsense is posted, so there is no reason not to point out that this is nonsense.
Actually, what he described is pretty standard introductory material for Bayesian probability.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4585185/
 
  • Like
Likes Auto-Didact
  • #134
PeterDonis said:
That criticism was basically saying the same thing that @Dale and I are saying in this thread: the p-value is the answer to a different question than the question you actually want the answer to.
This, as well as basically the entire thread, reminds me of a quote by Cantor:
To ask the right question is harder than to answer it.

This essentially is why science in general (and physics in particular) is difficult; i.e. not because solving technical (mathematical) questions can be somewhat difficult, but instead because the right question has to be identified and then asked first. This means that in any open-ended scientific inquiry one should postpone naively mathematicizing what can easily be mathematicized if it isn't clear what is essential, i.e. prematurely mathematicizing a conceptual issue into a technical issue is a waste of time which should be avoided!

It took me quite along while to learn this lesson because it goes against both my instincts as well as my training. Moreover, the realization that this lesson is actually useful is a reoccuring theme when doing applied mathematics in the service of some science, which only comes when one e.g. repeatedly tries to generalize from some particular idealization towards a more realistic description, which then generally turns out to be literally unreachable in any obvious way.
 
  • #135
Stephen Tashi said:
To get a mathematical answer, we would have to define what "evidence" for p > 1/2 means and what procedure will used to determine that evidence_A is stonger than evidence_B.
In Bayesian statistics this is well defined and straightforward.

https://en.m.wikipedia.org/wiki/Bayes_factor

Of course, there are limitations to any technique
 
  • #136
Auto-Didact said:
medicine is an extremely traditional discipline: an unspoken principle is 'don't fix what ain't broken'
I think there is a growing recognition of the parts of medical science that are broken. I am optimistic in the long term and even in the short term the changes are at least interesting.
 
  • #137
Dale said:
Actually, what he described is pretty standard introductory material for Bayesian probability.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4585185/

@PeterDonis I apologise as I spoke too harshly. I really don't want to get involved in a debate on medical statistics and how they are used. I didn't realize that was what was at the root of all this.

That article seems to me more about the politics of communicating with patients than actual statistic methods themselves.

If you are all telling me that traditional statistical methods are widely misunderstood and misused in medical science, then I have no grounds to challenge that.
 
  • #138
PeroK said:
That article seems to me more about the politics of communicating with patients than actual statistic methods themselves.
Yes, the communication with patients is particularly important since they cannot be expected to understand the statistical issues themselves. The article did talk about the fact that for rare diseases the likelihood of having the disease after receiving a positive test result is low. I.e. for rare diseases most positives are false positives.
 
  • #139
Dale said:
Yes, the communication with patients is particularly important since they cannot be expected to understand the statistical issues themselves. The article did talk about the fact that for rare diseases the likelihood of having the disease after receiving a positive test result is low. I.e. for rare diseases most positives are false positives.
Yes, but it doesn't take Bayesian methods to come to that conclusion.
 
  • Like
Likes Dale
  • #140
PeterDonis said:
The results of medical tests for rare conditions are usually much better analyzed using Bayesian methods, yes, because those methods correctly take into account the rarity of the underlying condition, in relation to the accuracy of the test. Roughly speaking, if the condition you are testing for is rarer than a false positive on the test, any given positive result on the test is more likely to be a false positive than a true one. Frequentist methods don't give you the right tools for evaluating this.

As @PeroK has pointed out, this is wrong. You are getting Bayes's rule confused with Bayesian methods. Bayes's rule is part of both Frequentist and Bayesian methods. Frequentist methods and Bayes's rule are perfectly fine for analyzing rare conditions.
 
  • Like
Likes PeroK
  • #141
atyy said:
As @PeroK has pointed out, this is wrong. You are getting Bayes's rule confused with Bayesian methods. Bayes's rule is part of both Frequentist and Bayesian methods. Frequentist methods and Bayes's rule are perfectly fine for analyzing rare conditions.
Bayes' theorem is explicitly not part of the formalism of frequentist probability theory. Any importation of Bayes' theorem into statistical practice using frequentist methods is a transition to statistical practice using Bayesian methods.
 
  • Skeptical
Likes Dale
  • #142
Auto-Didact said:
Bayes' theorem is explicitly not part of the formalism of frequentist probability theory. Any importation of Bayes' theorem into statistical practice using frequentist methods is a transition to statistical practice using Bayesian methods.

Bayes' theorem can be proved with a simple use of a Venn diagram. It must be true. It also falls out of the "probability tree" approach.

You are confusing statistical methods with probability theory. Bayes' theorem is a fundamental part of probability theory that underpins any set of statistical methods.

The Wikipedia page gives the two Bayesian and frequentist interpretations of the theorem:

https://en.wikipedia.org/wiki/Bayes'_theorem#Bayesian_interpretation
 
  • Like
Likes Dale
  • #143
I agree that Bayes' theorem is generally valid, as part of mathematics. It is instead the interpretation of probability theory based on the idea that probabilities are objective relative frequencies which specifically doesn't acknowledge the general validity of Bayes' theorem w.r.t. probabilities. Standard statistical methodology are based on this frequentist interpretation of probability theory.
 
  • Skeptical
Likes Dale
  • #144
Here, Andrew Gelman, a noted Bayesian, explicitly says that one does not need to be a Bayesian to apply Bayes's rule.

http://www.stat.columbia.edu/~gelman/research/published/badbayesmain.pdf
Bayesian statisticians are those who would apply Bayesian methods to all problems (Everyone would apply Bayesian inference in situations where prior distributions have a physical basis or a plausible scientific model, as in genetics.)

Of course, one should not need Gelman's authority to say this. Bayes's rule is just a basic part of probability.
 
  • Like
Likes Dale
  • #145
Auto-Didact said:
It is instead the interpretation of probability theory based on the idea that probabilities are objective relative frequencies which specifically doesn't acknowledge the general validity of Bayes' theorem w.r.t. probabilities.

That is simply a fundamental misunderstanding on your part.
 
  • #146
PeroK said:
That is simply a fundamental misunderstanding on your part.
This seems to go in the face of the literature, as well as how statistical methodology is actually practiced.

What do you mean by the term Bayesian methods? It seems that you aren't referring to any statistical methods based on Bayesian probability theory as invented by Laplace, but instead to something else much more limited in scope.
 
  • #147
Auto-Didact said:
This seems to go in the face of the literature, as well as how statistical methodology is actually practiced.

What do you mean by the term Bayesian methods? It seems that you aren't referring to any statistical methods based on Bayesian probability theory as invented by Laplace, but instead to something else much more limited in scope.

Technically a "statistic" is, by definition, something used to estimate a population parameter. The simplest example is the mean. One of the first things you have to do is decide whether the mean is relevant. If you have some data, no one argues (within reason) over the value of the mean. The debate would be on the relevance of the mean as an appropriate statistic.

Overuse of the mean could be seen as a questionable statistical method. E.g. taking average salary, where perhaps the median is more important. Average house price, likewise.

Testing the null hypothesis and using the p-value is a statistical method. Again, there is probably no argument over the p-value itself, but of its relevance.

These are examples of traditional (aka frequentist) statistical methods.

Examples of Bayesian methods have been given by @Dale in this thread.

The example that started this thread perhaps illustrates the issues. I'l do a variation:

We start, let's say, with a family of six girls and no boys.

1) You could argue that there is no medical evidence or hypothesis that some couples have a predisposition to girls, hence there is no point in looking at this data. Instead you must look at many families and record the distribution in terms of size and sex mixture. This is simply a family with six boys - so what? - that happens.

2) You could suggest a hypothesis that this couple is more likely to have boys than girls and test that. But, with only six children standard statistical methods are unlikely to tell you anything. Even if you consider this an undertaking of any purpose.

3) You could analyse the data using Bayesian methods and calculate a posterior mean for that particular couple. Again, you have to decide whether this calculation is of any relevance.

Here a general theme emerges. Bayesian are able to say something about data where traditionalists are silent. That could be good or bad. What's said could be an insight that traditional methods miss; or, it could be a misplaced conclusion.
 
  • #148
Auto-Didact said:
This seems to go in the face of the literature, as well as how statistical methodology is actually practiced.

What do you mean by the term Bayesian methods? It seems that you aren't referring to any statistical methods based on Bayesian probability theory as invented by Laplace, but instead to something else much more limited in scope.

I found this. It looks good to me:

https://www.probabilisticworld.com/frequentist-bayesian-approaches-inferential-statistics/
 
  • #149
Auto-Didact said:
Bayes' theorem is explicitly not part of the formalism of frequentist probability theory. Any importation of Bayes' theorem into statistical practice using frequentist methods is a transition to statistical practice using Bayesian methods.
I don’t think Rev Bayes signed an exclusive licensing agreement with the Bayesianists for the use of his theorem. Frequentists can still use it.
 
  • Like
Likes PeterDonis and PeroK
  • #150
PeroK said:
The Wikipedia page gives the two Bayesian and frequentist interpretations of the theorem:

https://en.wikipedia.org/
I hope you agree that there is a huge difference between Bayes theorem appearing as an extratheoretical purely mathematical consequence of set theoretical intersections and (the functions in) Bayes theorem serving as the definition of probability; only the latter is Bayesian probability theory.
PeroK said:
Technically a "statistic" is, by definition, something used to estimate a population parameter. The simplest example is the mean. One of the first things you have to do is decide whether the mean is relevant. If you have some data, no one argues (within reason) over the value of the mean. The debate would be on the relevance of the mean as an appropriate statistic.

Overuse of the mean could be seen as a questionable statistical method. E.g. taking average salary, where perhaps the median is more important. Average house price, likewise.

Testing the null hypothesis and using the p-value is a statistical method. Again, there is probably no argument over the p-value itself, but of its relevance.

These are examples of traditional (aka frequentist) statistical methods.

Examples of Bayesian methods have been given by @Dale in this thread.

The example that started this thread perhaps illustrates the issues. I'l do a variation:

We start, let's say, with a family of six girls and no boys.

1) You could argue that there is no medical evidence or hypothesis that some couples have a predisposition to girls, hence there is no point in looking at this data. Instead you must look at many families and record the distribution in terms of size and sex mixture. This is simply a family with six boys - so what? - that happens.

2) You could suggest a hypothesis that this couple is more likely to have boys than girls and test that. But, with only six children standard statistical methods are unlikely to tell you anything. Even if you consider this an undertaking of any purpose.

3) You could analyse the data using Bayesian methods and calculate a posterior mean for that particular couple. Again, you have to decide whether this calculation is of any relevance.

Here a general theme emerges. Bayesian are able to say something about data where traditionalists are silent. That could be good or bad. What's said could be an insight that traditional methods miss; or, it could be a misplaced conclusion.
I basically agree with all of this, but the question is why are Bayesians able to say something when frequentists must be silent: the answer is that they have another definition of probability.
PeroK said:
Again, a certain formula appearing as an application when doing mathematics and a certain formula being the central definition of the theory are clearly two different things.
Dale said:
I don’t think Rev Bayes signed an exclusive licensing agreement with the Bayesianists for the use of his theorem. Frequentists can still use it.
Of course frequentists can use it, in the same sense that curved space can be imported into QFT by engaging in semi-classical physics. If they use it as a form of applied mathematics on intersecting sets then there is no foul play, but if they use it for statistical inference in such a manner that Bayes theorem replaces the frequentist definition of probability then they are de facto doing Bayesian statistics while merely pretending not to.

The key question is therefore if the given theorem has a fundamental status within their theory as the central definition or principle; clearly for frequentist probability theory and any statistical method of inference based thereon the answer is no.
 
  • Skeptical
Likes Dale

Similar threads

  • · Replies 10 ·
Replies
10
Views
5K
  • · Replies 1 ·
Replies
1
Views
944
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
6K
Replies
6
Views
2K
Replies
5
Views
2K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K