• #1
Dale
Mentor
Insights Author
2020 Award
30,696
7,288
Confessions of a moderate Bayesian, part 2
Read Part 1: Confessions of a moderate Bayesian, part 1
Bayesian statistics by and for non-statisticians
https://www.cafepress.com/physicsforums.13280237
Background
One of the continuous and occasionally contentious debates surrounding Bayesian statistics is the interpretation of probability. For anyone who is familiar with my posts on this forum I am not generally a big fan of interpretation debates. This one is no exception. So I am going to present both interpretations as factually as I can, and then conclude with my personal take on the issue and my approach.
Probability axioms
Probability is a mathematical concept that is applied to various domains. I think that it is worthwhile to point out the mathematical underpinnings in at least a brief and...
Continue reading...
 
  • Like
Likes sysprog, beamie564, bhobba and 4 others

Answers and Replies

  • #2
WWGD
Science Advisor
Gold Member
5,419
3,680
If I may offer a suggestion, or maybe you can reply here, on the two different interpretations of probabilistic statements such as :" There is a 60% chance of rain for (e.g.) Thursday." In frequentist perspective, I believe this means that in previous times with a similar combination of conditions as the ones before Thursday, it rained 60% of the time. I have trouble finding a Bayesian interpretation for this claim. You may have a prior, but I can't see what data you would use to update it to a posterior probability.
 
  • Like
Likes sysprog
  • #3
14,169
11,470
Well, a bit biased against frequentists if you ask me. I do not have a strong opinion on either side, the more as I studied decision theory and subjective probabilities in the process. However, I remember some heated discussions about the issue, and I'm not sure whether Bayesians have many friends among stochastics.
 
  • Like
Likes sysprog
  • #4
Dale
Mentor
Insights Author
2020 Award
30,696
7,288
Well, a bit biased against frequentists if you ask me.
Well, I am a moderate Bayesian, so I do lean towards Bayes in my preferences. But being moderate I also use the frequentist interpretation and frequentist methods whenever convenient or useful.

I just don’t think that my preference is “right” or that someone else’s preference is “wrong”. I use both and even find cases where using both together is helpful.
 
Last edited:
  • Like
Likes sysprog
  • #5
Dale
Mentor
Insights Author
2020 Award
30,696
7,288
There is a 60% chance of rain for (e.g.) Thursday." In frequentist perspective, I believe this means that in previous times with a similar combination of conditions as the ones before Thursday, it rained 60% of the time. I have trouble finding a Bayesian interpretation for this claim.
The Bayesian interpretation is straightforward. It just means that I am not certain that it is going to rain on Thursday, but I think it is likely. More operationally, if I had to bet a dollar either that it would rain on Thursday or that I would get heads on a single flip of a fair coin, then I would rather take the bet on the rain.

You may have a prior, but I can't see what data you would use to update it to a posterior probability.
To update your probability you need to have a model.

For a concrete example, suppose that the only condition you were looking at is barometric pressure. A typical model might be that the log of the odds of rain is a linear function of the barometric pressure. Then the previous data would be used to estimate the slope and the intercept of that model.
 
Last edited:
  • Like
Likes sysprog and WWGD
  • #6
anorlunda
Staff Emeritus
Insights Author
9,031
5,952
Will this be a 3 part series? 4? Will you give numeric examples? A preview would be nice.
 
  • #7
Dale
Mentor
Insights Author
2020 Award
30,696
7,288
Will this be a 3 part series? 4? Will you give numeric examples? A preview would be nice.
I will have numerical examples for most of them. This one was just philosophical, so it didn’t really lend itself to examples.

I think that I will have at least two more. The one I am working on now is about Bayesian inference in science. It will include how the Bayesian approach naturally includes Occham’s razor and Popper’s falsifiability. The fourth will be a deeper dive into the posterior distribution and the posterior predictive distribution.

After that, I don’t know.
 
  • Like
Likes sysprog, beamie564, Klystron and 2 others
  • #8
Stephen Tashi
Science Advisor
7,545
1,456
Now, we need a way to determine the measure ##P(H)##. For frequentist probabilities the way to determine ##P(H)## is to repeat the experiment a large number of times and calculate the frequency that the event ##H## happens. In other words, if you do ##N## trials and get ##n_H## heads then
##P(H) = lim_{N \rightarrow \infty} \frac{ n_h} {N}##
So a frequentist probability is simply the “long run” frequency of some event.
It should be emphasized that the notation "##P(H) = lim_{N \rightarrow \infty} \frac{ n_h} {N}##" conveys an intuitive belief, not a statement that has a precise mathematical definition in terms of the concept in calculus denoted by the similar looking notation ## L = \lim_{N \rightarrow \infty} f(N)##.


In applications of statistics we typically assume that "in the long run" observed frequencies of events will approximately be equal to their probability of ocurrence. ( In applying probability theory to a real life situation, would a Bayesian disagree with that intuitive notion? ) But probability theory itself does not make this assumption. The nearest thing to it is the "Law of Large Numbers", but that law, like most theorems of probability, tells us about the probability of something happening, not about an absolute guarantee that it will.
 
  • Like
Likes sysprog
  • #9
Dale
Mentor
Insights Author
2020 Award
30,696
7,288
"in the long run" observed frequencies of events will approximately be equal to their probability of ocurrence. ( In applying probability theory to a real life situation, would a Bayesian disagree with that intuitive notion? )
There are theorems demonstrating that in the long run the Bayesian probability converges to the frequentist probability for any suitable prior (eg non-zero at the frequentist probability)

It should be emphasized that the notation "P(H)=limN→∞nhN" conveys an intuitive belief, not a statement that has a precise mathematical definition
What do you mean here?
 
  • Like
Likes sysprog
  • #10
Stephen Tashi
Science Advisor
7,545
1,456
What do you mean here?
The interpretation of "##\lim_{N \rightarrow \infty} \frac{ n_h}{N} = P(H)##" in the sense used in calculus would say that for each ##\epsilon > 0 ## there exists and ##M > 0## such that if ##N > M## then ##P(H) - \epsilon < \frac{n_h}{N} < P(H) + \epsilon ##. However, there is no gurantee that this will happen. To assert that it must happen contradicts the concept of a probabilistic experiment. The quantity ##\frac{n_h}{N}## is not a deterministic function of ##N##, so the notation used in calculus for limits of functions does not apply.

For independent trials, the calculus type of limit that does exist, for a given ##\epsilon > 0## is ##lim_{n \rightarrow \infty} Pr( P(H) - \epsilon < S(N) < P(H) + \epsilon) = 1## where ##S## is a deterministic function of ##N##. To compute ##S## we use the probability distribution for ##N## replications of the experiment to compute the probability that there is a number of occurences ##n_h## that makes ##P(H) -\epsilon < \frac{n_h}{N} < P(H) + \epsilon\ ##. The notation " ##n_h##" denotes an index variable for a summation of probabilites. We sum over all ##n_h## that satisfy the above inequality. So ##S## is a function ##N##, not of ##n_h##.

There is no disagreement between Bayesians and frequentists about how such a limit is interpreted.
 
Last edited:
  • Like
Likes sysprog
  • #11
Dale
Mentor
Insights Author
2020 Award
30,696
7,288
For independent trials, the calculus type of limit that does exist, for a given ϵ>0 is limn→∞Pr(P(H)−ϵ<S(N)<P(H)+ϵ)=1 where S is a deterministic function of N.
Nice.

Is that considered problematic by frequentist purists? It seems to define probability in terms of probability.
 
  • Like
Likes sysprog
  • #12
Stephen Tashi
Science Advisor
7,545
1,456
Is that considered problematic by frequentist purists? It seems to define probability in terms of probability.
Such a limit is used in technical content of The Law Of Large Numbers and frequentists don't disagree with that theorem.

To me, the essential distinction between the frequentist approach and the Bayesian approach boils down to whether certain variables are assumed to represent a "a definite but unknown" quantity versus a quantity that is the outcome of some stochastic process. For example, a frequentist might model a situation as a sequence of bernoulli trials with definite but unknown probability ##p##. In that case, questions like "Given there are 5 successes in 10 benoulli trials, what is the probability that ##.4 < p < .6##?" is almost meaningless because ##p## is not something that has a nontrivial probability distribution. So we can only say that ##Pr(4 < p < .6)## is either 1 or zero, and we don't know which. By contrast, a Bayesian might model the situation as a sequence of benoulli trials peformed after Nature or something else uses a stochastic process to determine ##p## and be bold enough to assume a probability distribution for ##p##. In that scenario, the above question has a meaningful answer.

A frequentist criticism of the Bayesian approach is: Suppose ##p## was indeed the result of some stochastic process. The value of ##p## has already been selected by that process. Are we to base our analysis only on taking a single sample of ##p## from the process?"

A Bayesian criticism of the frequentist approach is "You aren't setting up a mathematical problem that answers questions that people want to ask. People want answers to questions of the form "What is the probability that < some property of the situation> is true given we have observed the data?" The way you model the problem, you can only answer questions of the form "Assuming <some property of the situation> is true then what is the probability of the observed data?"
 
  • Like
Likes sysprog and Ygggdrasil
  • #13
Dale
Mentor
Insights Author
2020 Award
30,696
7,288
Such a limit is used in technical content of The Law Of Large Numbers and frequentists don't disagree with that theorem
No, of course not. But I don’t think that you can use the limit you posted above as a definition for frequency-based probability non-circularly.

To me, the essential distinction between the frequentist approach and the Bayesian approach boils down to whether certain variables are assumed to represent a "a definite but unknown" quantity versus a quantity that is the outcome of some stochastic process.
I agree more or less. I would say that the issue is not exactly whether a quantity is definite but unknown, but rather whether or not to use probability to represent such a quantity.

E.g. I think that both Bayesians and frequentists would classify ##G## as definite but unknown, but Bayesians would happily assign it a PDF and frequentists would not.

I think that is only slightly different from your take.
 
  • #14
Stephen Tashi
Science Advisor
7,545
1,456
No, of course not. But I don’t think that you can use the limit you posted above as a definition for frequency-based probability non-circularly.
I agree. And, as far as I can see, no formal definition of any kind of limit defines the concept of a probability.

As you mentioned in the insight, the mathematical approach to probability defines it via a "measure", which is a certain type of function whose domain is a collection of sets. This theory does not formalize the idea that it is possible to take samples of a random variable nor does it define probability in the context that there is one outcome that "actually" happens in an experiment where there are many "possible" outcomes. So the mathematical theory bypasses the complicated metaphysical concepts of "actuality" and "possibility". It does not formally define those concepts and hence says nothing about them.

Also, as you said, both Frequentists and Bayesians accept the mathematical theory of probability. So any difference in how the two schools formally define probability would have to be based on some method of creating a mathematical system that defines new things that underlie the concept of probability and shows how these new things can be used to define a measure. I recall seeing examples where a formal mathematical model of "degree of belief" or "amount of information" is developed and probability is defined in terms of the mathematical objects in such models. Richard Von Mises had the view that probability can be defined as a "limiting frequency" http://www.statlit.org/pdf/2008SchieldBurnhamASA.pdf but the consensus view of mathematicians is that his approach doesn't pass muster as formal mathematics.

However, I think most practicing statisticians don't think in terms of a precisely defined mathematical structure that underlies probability. The way that typical Frequentists differ from typical Bayesians is in how their imprecise and intuitive notions differ -i.e. in their metaphysical opinions.
 
  • Like
Likes sysprog
  • #15
Dale
Mentor
Insights Author
2020 Award
30,696
7,288
So any difference in how the two schools formally define probability would have to be based on some method of creating a mathematical system that defines new things that underlie the concept of probability and shows how these new things can be used to define a measure.
I think we are running into a miscommunication here. I agree with the point you are making, but it isn’t what I am asking about.

In physics we have the mathematical concept of a vector and the application of a velocity. In order to use velocity vectors you need more than just the axioms and theorems of vectors, you also need an operational definition of how to determine velocity. Here, communication is hampered because we use the word probability to refer to both the mathematical structure and the thing represented by the structure. There needs to be operational definitions of frequentist and Bayesian probability. That is what I am talking about.

I think that Bayesians have a good operational definition of probability. The valid limit you described above would be a circular operational definition for frequentist probability, but unfortunately I don’t know a better one. The one I wrote isn’t circular, but as you correctly pointed out it isn’t a real limit.
 
  • Like
Likes sysprog
  • #16
Dale
Mentor
Insights Author
2020 Award
30,696
7,288
@Stephen Tashi FYI, I modified the Insight to get rid of the limit and make it a little less rigorous while hopefully still conveying the basic idea of what frequentists operationally mean.
 
  • #17
Stephen Tashi
Science Advisor
7,545
1,456
There needs to be operational definitions of frequentist and Bayesian probability. That is what I am talking about.
Ideally, there is a need for such definitions, but it will be hard to say anything precise. People make subjective decisions without having a coherent system of ideas to justify them. You can look at what prominent Bayesians say versus prominent Frequentists say. Prominent people usually feel obligated to portray their opinions as clear and systematic. But prominent people can also be individualistic, so you might not find any consensus views.

From reading other articles about Frequentist vs Bayesian approaches to statistics, those articles have definite opinions about the differences. However, is there really a consensus view of probability among Frequentists or among Bayesians? Are the authors of this type of article just copy catting what previous authors of this type of article have written? - namely that Bayesians view probability as "subjective" and Frequentists view it as "objective".

I can't see a Bayesian (of any sort) defending an estimate of a probability that is contradicted by a big batch of data. So is it correct to say that Bayesians don't accept the intuitive idea that a probability is revealed as a limiting frequency?

If a Frequentist decides to model a population by a particular family of probability distributions, will he claim that he has made an objective decision?
 
Last edited:
  • #18
atyy
Science Advisor
14,213
2,471
People make subjective decisions without having a coherent system of ideas to justify them.
I know you mean "coherent" in a different sense, but Bayesian probability is coherent, where "coherent" is a technical term.

I can't see a Bayesian (of any sort) defending an estimate of a probability that is contradicted by a big batch of data. So is it correct to say that Bayesians don't accept the intuitive idea that a probability is revealed as a limiting frequency?

If a Frequentist decides to model a population by a particular family of probability distributions, will he claim that he has made an objective decision?
Although Bayesians and Frequentists start from different assumptions, Bayesians can use many Frequentist procedures when there is exchangeability and the de Finetti repesentation theorem applies.
http://www.stats.ox.ac.uk/~steffen/teaching/grad/definetti.pdf
 
  • #19
Stephen Tashi
Science Advisor
7,545
1,456
I know you mean "coherent" in a different sense, but Bayesian probability is coherent, where "coherent" is a technical term.
How are you defining a "Bayesian probability"?

Are you referring to a system of mathematics that postulates some underlying structure for probability and then defines a probability measure in terms of objects defined in that underlying structure?

Although Bayesians and Frequentists start from different assumptions, Bayesians can use many Frequentist procedures when there is exchangeability and the de Finetti repesentation theorem applies.
http://www.stats.ox.ac.uk/~steffen/teaching/grad/definetti.pdf
Those notes show an example of where a Frequentist assumes the existence of a "fixed but unknown" distribution ##Q## and a Bayesian assumes a distribution ##P##, and it is proven that "In ##P## the distribution ##Q## exists as a random object". Apparently both ##P## and ##Q## are parameterized by a single parameter called "the limiting frequency".

Isn't the general pattern for the Bayesian approach to take a parameter ##k## of a distribution ##Q_k## that a Frequentist would assume is "fixed but unknown" and model ##k## as the outcome of a random variable ##P##? That approach makes ##k## and ##Q_k## random objects generated by ##P##.

I don't see how the example in those notes gives a Bayesian any special liberty to turn a Frequentist variable into a Bayesian random variable that a Bayesian would not ordinarily take.

The notes say they demonstrate a "bridge" between the two approaches. I don't know how to interpret that. One guess is that if Bayesian models a situation by assuming ##P## then he finds that a random distribution ##Q_k## "pops out" that can be interpreted giving possible choices for the "fixed but unknown" distribution ##Q_k## that a Frequentist would use. Whereas the typical Bayesian approach would be to start with ##Q_k## and turn ##Q_k## into a random distribution by turning ##k## into a random variable.
 
  • #20
Dale
Mentor
Insights Author
2020 Award
30,696
7,288
You can look at what prominent Bayesians say versus prominent Frequentists say. Prominent people usually feel obligated to portray their opinions as clear and systematic. But prominent people can also be individualistic, so you might not find any consensus views.
Aren’t prominent people in a field considered prominent precisely because the consensus in that field is to adopt their view?

If a Frequentist decides to model a population by a particular family of probability distributions, will he claim that he has made an objective decision?
This is a good point. But they can certainly objectively test if that decision is supported by the data. (It almost never is for large data sets).

Anyway, your responses here have left me thinking that the standard frequentist operational definition is circular. I had originally thought that the limit I wrote was valid, but you are correct that it is not a legitimate limit. But the replacement you offered uses probability to define probability, so that is circular. Circularity is not necessarily an unresolvable problem, but it at least bears scrutiny.
 
  • #21
Stephen Tashi
Science Advisor
7,545
1,456
Aren’t prominent people in a field considered prominent precisely because the consensus in that field is to adopt their view?
Yes - with the caveat that adopting the views of a prominent person by citing a mild summary of them is different than understanding their details! It can be embarrassing to find yourself using a method when a well known proponent of the method has extreme views. As a moderate Bayesian, would you associate yourself with DeFinneti's:

My thesis, paradoxically, and a little provocatively, but nonetheless genuinely, is simply this:
PROBABILITY DOES NOT EXIST
The abandonment of superstitious beliefs about the existence of the Phlogiston,the Cosmic Ether, Absolute Space and Time,...or Fairies and Witches was an essential step along the road to scientific thinking. Probability, too, if regarded as something endowed with some kind of objective existence, is no less a mis-leading misconception, an illusory attempt to exteriorize or materialize our true probabilistic beliefs.
as quoted in the paper by Nau https://faculty.fuqua.duke.edu/~rnau/definettiwasright.pdf

An interpretation of DeFinetti's position is that we cannot implement probability as an (objective) property of a physical system. So we can't (objectively) toss a fair coin or throw a fair dice ? - or even an unfair coin or unfair dice with some objective physical properties that measure the unfairness.
 
Last edited:
  • #22
Dale
Mentor
Insights Author
2020 Award
30,696
7,288
An interpretation of DeFinetti's position is that we cannot implement probability as an (objective) property of a physical system.
Isn’t that essentially what you proved above? I don’t understand your point.

If the frequentist definition of probability is circular as you showed then it does seem like it isn’t an objective property of a physical system.

I am not sure what point you are trying to make with your posts. Can you clarify?

So we can't (objectively) toss a fair coin or throw a fair dice ?
Don’t you mean “So we can’t (objectively) assign a probability to the toss of a fair coin or the throw of a fair dice?”
 
  • #23
Stephen Tashi
Science Advisor
7,545
1,456
(For some reason, the Reply function of the forums page isn't quoting @Dale 's previous post for me.)

I am not sure what point you are trying to make with your posts. Can you clarify?
Besides being a mere critic of other posts, I'll make the (perhaps self-evident) points:

Bayesian vs Frequentist can be described in practical terms as a style of choosing probability models for real life problems. People who pick a particular style do not necessarily accept or understand the philosophical views of prominent Bayesians and Frequentists.

The Bayesian style of probability modeling is to use a probability model that answers questions of the form that people most commonly ask. E.g. Given the data, what is the probability that the population has such-and-such properties?

The Frequentist style of probability modeling is to use the minimum number of parameters and assumptions - even if this results in only being able to answer questions of the form: Given I assume the population has such-and-such properties, what is the probability of the data?

Undestanding the distinction between the Bayesian and Frequentist styles is made difficult by the fact that Frequentists use a vocabulary that strongly suggests that they are answering the questions that the Bayesian method is obligated to answer. For example, "There is 90% confidence that the observed mean will be within plus or minus .23 of the population mean" suggests (but does not acutally imply) that "The observed mean is 6.00, therefore there is a .90 probability that the population mean is in the interval [6.00- 0.23, 6.00+0.23]. Similar misinterpretations of the terms like "statistical significance" and "p-value" suggest to laymen, and even students of introductory statistics, that Frequentist methods are telling them something about the probability of some fact given the observed data. But instead Frequentism generally deals with probabilities where the condition is changed to be "Given these facts are assumed , the probability of the observed data is ....".

The biggest obstacle to explaining the practical difference between Bayesian statistics and Frequentist statistics is explaining that the methods answer different questions. The biggest obstacle to explaining that the methods answer different questions is negotiating the treacherous vocabulary of Frequentist statistics to clarify the type of question that Frequentist statistics actually answers. Explaining the difference between Bayesian and Frequentist distinctions in terms of a difference in "subjective" and "objective" probability does not, by itself, explain the practical distinction. A reader might keep the misconception that Frequentist methods and Bayesian methods solve the same problems, and conclude that the difference in the styles only has to do with the different philosophical thoughts swimming about in the minds of two people who are doing the same mathematics.

---------

As to an interpretation of probability in terms of observed frequencies, mathematically it can only remain an intuitive notion. The attempt to use probability to say something definite about an observed frequency is self-contradictory except in the trivial case where you assign a particular frequency a probability of 1, or of zero. For example, it would be satisfying to say "In 100 tosses of a fair coin, at least 3 tosses will be heads". That type of statement is an absolute guaranteed connection between a probabilty and an observed frequency. However, the theorems of probability theory provide no such guaranteed connections. The theorems of probability tell us about the probability of frequencies. The best we can get in absolute guarantees are theorems with conclusions like ##lim_{n \rightarrow \infty} Pr( E(n)) = 1 ##. Then we must interpret what such a limit means. Poetically, we can say "At infinity the event ##E(\infty)## is guaranteed to happen". But such a verbal interpretation is mathematically imprecise and, in applications, the concept of an event "at infinity" may or may not make sense.

As a question in physics, we can ask whether there exists a property of situations called probability that is independent of different observers - to the extent that if different people perform the same experiment to test a situation, they (probably) will get (approximately) the same estimate for the probability in question if they collect enough data. If we take the view that we live in a universe where scientists have at least average luck, we can replace the qualifying adjective "probably" with "certainly" and if we idealize "enough data" to be"an infinite amount of data", we can change "approximately" to "exactly". Such thinking is permitted in physics. I think the concept is called "physical probability".

My guess is that most people who do quantum physics believe in physical probability. Prominent Bayesians like de Finetti explicitly reject the existence of such objective probabilities. I haven't researched prominent Frequentists. I don't even know who they are yet, so I don't know if any of them assert physical probabilities are real. The point of mentioning this is that, yes, there is detail involved in explaining the difference between "objective" and "subjective" probability. However, as pointed out above, explaining all this detail does not, by itself, explain the practical distinction between the styles of Bayesian vs Frequentist probability modeling.

In fact, the cause-and-effect relation between a persons metaphysical opinions and their style of probability modeling is, to me, unclear. Historically, how did the connection between the metaphysics of Bayesians and the probability modeling style of Bayesians evolve? Did one preceed the other? Were there people who held Frequentist philosophical beliefs but began using the Bayesian style of probability modeling?

[Just found this: The article https://projecteuclid.org/download/pdf_1/euclid.ba/1340371071 indicates that a Bayesian style of probability modeling existed before the philosophical elaboration of subjective probability. It was called using "inverse probability".]
 
Last edited:
  • #24
9,563
2,645
Good one Dale. I am a frequentist myself. However as you pointed out its real basis is the Kolmogorov axioms. The frequentist view is 'intuitive', based on the strong law of large numbers, but has 'logical' issues. The Bayesian view has no logical issues, but is not what is usually used in many applied areas. It's a bit like calculus - real analysis is its correct basis, but in many applied areas you think of dx and dy as so small it is for many practical purposes zero, and certainly (dx)^2 and (dy)^2 can be neglected. Once you look on it that way it is simply a matter of choosing how you view it, depending on what the problem is and how you attack solving it.

As a bit further reading people might like to look in the Cox Axioms:
https://en.wikipedia.org/wiki/Cox's_theorem

Thanks
Bill
 
Last edited:
  • Like
Likes Dale
  • #25
Stephen Tashi
Science Advisor
7,545
1,456
(After looking at the paper by Feinberg https://projecteuclid.org/download/pdf_1/euclid.ba/1340371071 ) here is a simple way to define the practical difference between Frequentist and Bayesian styles of probability models.

Begin with a concise definition (from https://en.wikipedia.org/wiki/Inverse_probability, which references the Feinberg paper):

In probability theory, inverse probability is an obsolete term for the probability distribution of an unobserved variable.
For example, suppose we model 10 tosses a possibly unfair coin as a random variable with binomial distribution with probability ##p## of the coin landing heads. Then the observed data is the 10 results of tossing the coin. The parameter ##p## is not observed. (We can say the effects of ##p## are observed, but the value of ##p## itself is not directly observed.) If we assume a probability model where ##p## is assumed to have a uniform distribution on the interval [0,1] then we have assigned a probability distribution to an unobserved variable, so we are using inverse probability.

Using "inverse probability" is now what we would call assigning a prior distribution to a parameter. The modern terminology "prior distribution" does not emphasize the fact that it is a distribution for a quantity that is not directly observed in the data.

The practical distinction between Frequentists and Bayesians is: Frequentists reject the use of inverse probability and Bayesians employ it.

The correct description of the history of probability and statistics is not that the earliest methods were Frequentist methods and that Bayesian methods were an innovation that came later. Instead, the earliest methods included using "inverse probability"

Frequentism developed in the 1920's when prominent statisticians rejected the use of "inverse probability". I haven't researched why they rejected using inverse probability - whether their reasons were metaphysical or practical - or unique to each individual Frequentist.

The Fequentist style of statistics became the dominant style for decades. (It's an interesting question why this happened - perhaps because Frequentist probabiity models have a simpler structure. They minimize the number of proability distributions involved.)

Bayesian methods were recognized as a distinct style of probability modeling when statisticians began to revive the use of "inverse probability".

Describing the practical difference between Bayesian and Frequentist styles in terms of "inverse probability" is a correct explanation, but it does not delve into the consequences of the decision to use or not to use "inverse probability".

The consequences of rejecting "inverse probability" are usually that we get a probability model can only be used to answer questions of the form "Assuming such-and-such, what is the probability of the data?". Allowing the use of inverse probability can create probability models that answer questions of the form "Given the data, what is the probability of such-and-such?"

Explaining the consequences of using or not using "inverse probability" is a technical matter and requires a technical article. Explaining the practical difference between Bayesian and Frequentist styles in terms of the definition of "inverse probability" can be done without many technical details and starts the reader off on the right foot.
 
  • Like
Likes atyy

Related Threads on Exploring Frequentist Probability vs Bayesian Probability

  • Last Post
Replies
1
Views
1K
Replies
3
Views
1K
Replies
14
Views
1K
Replies
2
Views
2K
Replies
11
Views
3K
  • Last Post
Replies
2
Views
1K
Replies
2
Views
1K
Replies
3
Views
3K
  • Last Post
Replies
2
Views
2K
  • Last Post
Replies
5
Views
2K
Top