#### Dale

Mentor
Confessions of a moderate Bayesian, part 2
Read Part 1: Confessions of a moderate Bayesian, part 1
Bayesian statistics by and for non-statisticians
https://www.cafepress.com/physicsforums.13280237
Background
One of the continuous and occasionally contentious debates surrounding Bayesian statistics is the interpretation of probability. For anyone who is familiar with my posts on this forum, I am not generally a big fan of interpretation debates. This one is no exception. So I am going to present both interpretations as factually as I can, and then conclude with my personal take on the issue and my approach.
Probability axioms
Probability is a mathematical concept that is applied to various domains. I think that it is worthwhile to point out the mathematical underpinnings in at least a brief and...

Last edited by a moderator:
sysprog, beamie564, bhobba and 4 others

If I may offer a suggestion, or maybe you can reply here, on the two different interpretations of probabilistic statements such as :" There is a 60% chance of rain for (e.g.) Thursday." In frequentist perspective, I believe this means that in previous times with a similar combination of conditions as the ones before Thursday, it rained 60% of the time. I have trouble finding a Bayesian interpretation for this claim. You may have a prior, but I can't see what data you would use to update it to a posterior probability.

sysprog
Well, a bit biased against frequentists if you ask me. I do not have a strong opinion on either side, the more as I studied decision theory and subjective probabilities in the process. However, I remember some heated discussions about the issue, and I'm not sure whether Bayesians have many friends among stochastics.

sysprog
Well, a bit biased against frequentists if you ask me.
Well, I am a moderate Bayesian, so I do lean towards Bayes in my preferences. But being moderate I also use the frequentist interpretation and frequentist methods whenever convenient or useful.

I just don’t think that my preference is “right” or that someone else’s preference is “wrong”. I use both and even find cases where using both together is helpful.

Last edited:
sysprog
There is a 60% chance of rain for (e.g.) Thursday." In frequentist perspective, I believe this means that in previous times with a similar combination of conditions as the ones before Thursday, it rained 60% of the time. I have trouble finding a Bayesian interpretation for this claim.
The Bayesian interpretation is straightforward. It just means that I am not certain that it is going to rain on Thursday, but I think it is likely. More operationally, if I had to bet a dollar either that it would rain on Thursday or that I would get heads on a single flip of a fair coin, then I would rather take the bet on the rain.

You may have a prior, but I can't see what data you would use to update it to a posterior probability.
To update your probability you need to have a model.

For a concrete example, suppose that the only condition you were looking at is barometric pressure. A typical model might be that the log of the odds of rain is a linear function of the barometric pressure. Then the previous data would be used to estimate the slope and the intercept of that model.

Last edited:
sysprog and WWGD
Will this be a 3 part series? 4? Will you give numeric examples? A preview would be nice.

Will this be a 3 part series? 4? Will you give numeric examples? A preview would be nice.
I will have numerical examples for most of them. This one was just philosophical, so it didn’t really lend itself to examples.

I think that I will have at least two more. The one I am working on now is about Bayesian inference in science. It will include how the Bayesian approach naturally includes Occham’s razor and Popper’s falsifiability. The fourth will be a deeper dive into the posterior distribution and the posterior predictive distribution.

After that, I don’t know.

sysprog, beamie564, Klystron and 2 others
Now, we need a way to determine the measure ##P(H)##. For frequentist probabilities the way to determine ##P(H)## is to repeat the experiment a large number of times and calculate the frequency that the event ##H## happens. In other words, if you do ##N## trials and get ##n_H## heads then
##P(H) = lim_{N \rightarrow \infty} \frac{ n_h} {N}##
So a frequentist probability is simply the “long run” frequency of some event.

It should be emphasized that the notation "##P(H) = lim_{N \rightarrow \infty} \frac{ n_h} {N}##" conveys an intuitive belief, not a statement that has a precise mathematical definition in terms of the concept in calculus denoted by the similar looking notation ## L = \lim_{N \rightarrow \infty} f(N)##.

In applications of statistics we typically assume that "in the long run" observed frequencies of events will approximately be equal to their probability of ocurrence. ( In applying probability theory to a real life situation, would a Bayesian disagree with that intuitive notion? ) But probability theory itself does not make this assumption. The nearest thing to it is the "Law of Large Numbers", but that law, like most theorems of probability, tells us about the probability of something happening, not about an absolute guarantee that it will.

sysprog
"in the long run" observed frequencies of events will approximately be equal to their probability of ocurrence. ( In applying probability theory to a real life situation, would a Bayesian disagree with that intuitive notion? )
There are theorems demonstrating that in the long run the Bayesian probability converges to the frequentist probability for any suitable prior (eg non-zero at the frequentist probability)

It should be emphasized that the notation "P(H)=limN→∞nhN" conveys an intuitive belief, not a statement that has a precise mathematical definition
What do you mean here?

sysprog
What do you mean here?

The interpretation of "##\lim_{N \rightarrow \infty} \frac{ n_h}{N} = P(H)##" in the sense used in calculus would say that for each ##\epsilon > 0 ## there exists and ##M > 0## such that if ##N > M## then ##P(H) - \epsilon < \frac{n_h}{N} < P(H) + \epsilon ##. However, there is no gurantee that this will happen. To assert that it must happen contradicts the concept of a probabilistic experiment. The quantity ##\frac{n_h}{N}## is not a deterministic function of ##N##, so the notation used in calculus for limits of functions does not apply.

For independent trials, the calculus type of limit that does exist, for a given ##\epsilon > 0## is ##lim_{n \rightarrow \infty} Pr( P(H) - \epsilon < S(N) < P(H) + \epsilon) = 1## where ##S## is a deterministic function of ##N##. To compute ##S## we use the probability distribution for ##N## replications of the experiment to compute the probability that there is a number of occurences ##n_h## that makes ##P(H) -\epsilon < \frac{n_h}{N} < P(H) + \epsilon\ ##. The notation " ##n_h##" denotes an index variable for a summation of probabilites. We sum over all ##n_h## that satisfy the above inequality. So ##S## is a function ##N##, not of ##n_h##.

There is no disagreement between Bayesians and frequentists about how such a limit is interpreted.

Last edited:
sysprog
For independent trials, the calculus type of limit that does exist, for a given ϵ>0 is limn→∞Pr(P(H)−ϵ<S(N)<P(H)+ϵ)=1 where S is a deterministic function of N.
Nice.

Is that considered problematic by frequentist purists? It seems to define probability in terms of probability.

sysprog
Is that considered problematic by frequentist purists? It seems to define probability in terms of probability.

Such a limit is used in technical content of The Law Of Large Numbers and frequentists don't disagree with that theorem.

To me, the essential distinction between the frequentist approach and the Bayesian approach boils down to whether certain variables are assumed to represent a "a definite but unknown" quantity versus a quantity that is the outcome of some stochastic process. For example, a frequentist might model a situation as a sequence of bernoulli trials with definite but unknown probability ##p##. In that case, questions like "Given there are 5 successes in 10 benoulli trials, what is the probability that ##.4 < p < .6##?" is almost meaningless because ##p## is not something that has a nontrivial probability distribution. So we can only say that ##Pr(4 < p < .6)## is either 1 or zero, and we don't know which. By contrast, a Bayesian might model the situation as a sequence of benoulli trials peformed after Nature or something else uses a stochastic process to determine ##p## and be bold enough to assume a probability distribution for ##p##. In that scenario, the above question has a meaningful answer.

A frequentist criticism of the Bayesian approach is: Suppose ##p## was indeed the result of some stochastic process. The value of ##p## has already been selected by that process. Are we to base our analysis only on taking a single sample of ##p## from the process?"

A Bayesian criticism of the frequentist approach is "You aren't setting up a mathematical problem that answers questions that people want to ask. People want answers to questions of the form "What is the probability that < some property of the situation> is true given we have observed the data?" The way you model the problem, you can only answer questions of the form "Assuming <some property of the situation> is true then what is the probability of the observed data?"

sysprog and Ygggdrasil
Such a limit is used in technical content of The Law Of Large Numbers and frequentists don't disagree with that theorem
No, of course not. But I don’t think that you can use the limit you posted above as a definition for frequency-based probability non-circularly.

To me, the essential distinction between the frequentist approach and the Bayesian approach boils down to whether certain variables are assumed to represent a "a definite but unknown" quantity versus a quantity that is the outcome of some stochastic process.
I agree more or less. I would say that the issue is not exactly whether a quantity is definite but unknown, but rather whether or not to use probability to represent such a quantity.

E.g. I think that both Bayesians and frequentists would classify ##G## as definite but unknown, but Bayesians would happily assign it a PDF and frequentists would not.

I think that is only slightly different from your take.

No, of course not. But I don’t think that you can use the limit you posted above as a definition for frequency-based probability non-circularly.

I agree. And, as far as I can see, no formal definition of any kind of limit defines the concept of a probability.

As you mentioned in the insight, the mathematical approach to probability defines it via a "measure", which is a certain type of function whose domain is a collection of sets. This theory does not formalize the idea that it is possible to take samples of a random variable nor does it define probability in the context that there is one outcome that "actually" happens in an experiment where there are many "possible" outcomes. So the mathematical theory bypasses the complicated metaphysical concepts of "actuality" and "possibility". It does not formally define those concepts and hence says nothing about them.

Also, as you said, both Frequentists and Bayesians accept the mathematical theory of probability. So any difference in how the two schools formally define probability would have to be based on some method of creating a mathematical system that defines new things that underlie the concept of probability and shows how these new things can be used to define a measure. I recall seeing examples where a formal mathematical model of "degree of belief" or "amount of information" is developed and probability is defined in terms of the mathematical objects in such models. Richard Von Mises had the view that probability can be defined as a "limiting frequency" http://www.statlit.org/pdf/2008SchieldBurnhamASA.pdf but the consensus view of mathematicians is that his approach doesn't pass muster as formal mathematics.

However, I think most practicing statisticians don't think in terms of a precisely defined mathematical structure that underlies probability. The way that typical Frequentists differ from typical Bayesians is in how their imprecise and intuitive notions differ -i.e. in their metaphysical opinions.

sysprog
So any difference in how the two schools formally define probability would have to be based on some method of creating a mathematical system that defines new things that underlie the concept of probability and shows how these new things can be used to define a measure.
I think we are running into a miscommunication here. I agree with the point you are making, but it isn’t what I am asking about.

In physics we have the mathematical concept of a vector and the application of a velocity. In order to use velocity vectors you need more than just the axioms and theorems of vectors, you also need an operational definition of how to determine velocity. Here, communication is hampered because we use the word probability to refer to both the mathematical structure and the thing represented by the structure. There needs to be operational definitions of frequentist and Bayesian probability. That is what I am talking about.

I think that Bayesians have a good operational definition of probability. The valid limit you described above would be a circular operational definition for frequentist probability, but unfortunately I don’t know a better one. The one I wrote isn’t circular, but as you correctly pointed out it isn’t a real limit.

sysprog
@Stephen Tashi FYI, I modified the Insight to get rid of the limit and make it a little less rigorous while hopefully still conveying the basic idea of what frequentists operationally mean.

There needs to be operational definitions of frequentist and Bayesian probability. That is what I am talking about.

Ideally, there is a need for such definitions, but it will be hard to say anything precise. People make subjective decisions without having a coherent system of ideas to justify them. You can look at what prominent Bayesians say versus prominent Frequentists say. Prominent people usually feel obligated to portray their opinions as clear and systematic. But prominent people can also be individualistic, so you might not find any consensus views.

From reading other articles about Frequentist vs Bayesian approaches to statistics, those articles have definite opinions about the differences. However, is there really a consensus view of probability among Frequentists or among Bayesians? Are the authors of this type of article just copy catting what previous authors of this type of article have written? - namely that Bayesians view probability as "subjective" and Frequentists view it as "objective".

I can't see a Bayesian (of any sort) defending an estimate of a probability that is contradicted by a big batch of data. So is it correct to say that Bayesians don't accept the intuitive idea that a probability is revealed as a limiting frequency?

If a Frequentist decides to model a population by a particular family of probability distributions, will he claim that he has made an objective decision?

Last edited:
People make subjective decisions without having a coherent system of ideas to justify them.

I know you mean "coherent" in a different sense, but Bayesian probability is coherent, where "coherent" is a technical term.

I can't see a Bayesian (of any sort) defending an estimate of a probability that is contradicted by a big batch of data. So is it correct to say that Bayesians don't accept the intuitive idea that a probability is revealed as a limiting frequency?

If a Frequentist decides to model a population by a particular family of probability distributions, will he claim that he has made an objective decision?

Although Bayesians and Frequentists start from different assumptions, Bayesians can use many Frequentist procedures when there is exchangeability and the de Finetti repesentation theorem applies.

I know you mean "coherent" in a different sense, but Bayesian probability is coherent, where "coherent" is a technical term.

How are you defining a "Bayesian probability"?

Are you referring to a system of mathematics that postulates some underlying structure for probability and then defines a probability measure in terms of objects defined in that underlying structure?

Although Bayesians and Frequentists start from different assumptions, Bayesians can use many Frequentist procedures when there is exchangeability and the de Finetti repesentation theorem applies.

Those notes show an example of where a Frequentist assumes the existence of a "fixed but unknown" distribution ##Q## and a Bayesian assumes a distribution ##P##, and it is proven that "In ##P## the distribution ##Q## exists as a random object". Apparently both ##P## and ##Q## are parameterized by a single parameter called "the limiting frequency".

Isn't the general pattern for the Bayesian approach to take a parameter ##k## of a distribution ##Q_k## that a Frequentist would assume is "fixed but unknown" and model ##k## as the outcome of a random variable ##P##? That approach makes ##k## and ##Q_k## random objects generated by ##P##.

I don't see how the example in those notes gives a Bayesian any special liberty to turn a Frequentist variable into a Bayesian random variable that a Bayesian would not ordinarily take.

The notes say they demonstrate a "bridge" between the two approaches. I don't know how to interpret that. One guess is that if Bayesian models a situation by assuming ##P## then he finds that a random distribution ##Q_k## "pops out" that can be interpreted giving possible choices for the "fixed but unknown" distribution ##Q_k## that a Frequentist would use. Whereas the typical Bayesian approach would be to start with ##Q_k## and turn ##Q_k## into a random distribution by turning ##k## into a random variable.

You can look at what prominent Bayesians say versus prominent Frequentists say. Prominent people usually feel obligated to portray their opinions as clear and systematic. But prominent people can also be individualistic, so you might not find any consensus views.
Aren’t prominent people in a field considered prominent precisely because the consensus in that field is to adopt their view?

If a Frequentist decides to model a population by a particular family of probability distributions, will he claim that he has made an objective decision?
This is a good point. But they can certainly objectively test if that decision is supported by the data. (It almost never is for large data sets).

Anyway, your responses here have left me thinking that the standard frequentist operational definition is circular. I had originally thought that the limit I wrote was valid, but you are correct that it is not a legitimate limit. But the replacement you offered uses probability to define probability, so that is circular. Circularity is not necessarily an unresolvable problem, but it at least bears scrutiny.

Aren’t prominent people in a field considered prominent precisely because the consensus in that field is to adopt their view?

Yes - with the caveat that adopting the views of a prominent person by citing a mild summary of them is different than understanding their details! It can be embarrassing to find yourself using a method when a well known proponent of the method has extreme views. As a moderate Bayesian, would you associate yourself with DeFinneti's:

My thesis, paradoxically, and a little provocatively, but nonetheless genuinely, is simply this:
PROBABILITY DOES NOT EXIST
The abandonment of superstitious beliefs about the existence of the Phlogiston,the Cosmic Ether, Absolute Space and Time,...or Fairies and Witches was an essential step along the road to scientific thinking. Probability, too, if regarded as something endowed with some kind of objective existence, is no less a mis-leading misconception, an illusory attempt to exteriorize or materialize our true probabilistic beliefs.
as quoted in the paper by Nau https://faculty.fuqua.duke.edu/~rnau/definettiwasright.pdf

An interpretation of DeFinetti's position is that we cannot implement probability as an (objective) property of a physical system. So we can't (objectively) toss a fair coin or throw a fair dice ? - or even an unfair coin or unfair dice with some objective physical properties that measure the unfairness.

Last edited:
An interpretation of DeFinetti's position is that we cannot implement probability as an (objective) property of a physical system.
Isn’t that essentially what you proved above? I don’t understand your point.

If the frequentist definition of probability is circular as you showed then it does seem like it isn’t an objective property of a physical system.

I am not sure what point you are trying to make with your posts. Can you clarify?

So we can't (objectively) toss a fair coin or throw a fair dice ?
Don’t you mean “So we can’t (objectively) assign a probability to the toss of a fair coin or the throw of a fair dice?”

(For some reason, the Reply function of the forums page isn't quoting @Dale 's previous post for me.)

I am not sure what point you are trying to make with your posts. Can you clarify?
Besides being a mere critic of other posts, I'll make the (perhaps self-evident) points:

Bayesian vs Frequentist can be described in practical terms as a style of choosing probability models for real life problems. People who pick a particular style do not necessarily accept or understand the philosophical views of prominent Bayesians and Frequentists.

The Bayesian style of probability modeling is to use a probability model that answers questions of the form that people most commonly ask. E.g. Given the data, what is the probability that the population has such-and-such properties?

The Frequentist style of probability modeling is to use the minimum number of parameters and assumptions - even if this results in only being able to answer questions of the form: Given I assume the population has such-and-such properties, what is the probability of the data?

Undestanding the distinction between the Bayesian and Frequentist styles is made difficult by the fact that Frequentists use a vocabulary that strongly suggests that they are answering the questions that the Bayesian method is obligated to answer. For example, "There is 90% confidence that the observed mean will be within plus or minus .23 of the population mean" suggests (but does not acutally imply) that "The observed mean is 6.00, therefore there is a .90 probability that the population mean is in the interval [6.00- 0.23, 6.00+0.23]. Similar misinterpretations of the terms like "statistical significance" and "p-value" suggest to laymen, and even students of introductory statistics, that Frequentist methods are telling them something about the probability of some fact given the observed data. But instead Frequentism generally deals with probabilities where the condition is changed to be "Given these facts are assumed , the probability of the observed data is ...".

The biggest obstacle to explaining the practical difference between Bayesian statistics and Frequentist statistics is explaining that the methods answer different questions. The biggest obstacle to explaining that the methods answer different questions is negotiating the treacherous vocabulary of Frequentist statistics to clarify the type of question that Frequentist statistics actually answers. Explaining the difference between Bayesian and Frequentist distinctions in terms of a difference in "subjective" and "objective" probability does not, by itself, explain the practical distinction. A reader might keep the misconception that Frequentist methods and Bayesian methods solve the same problems, and conclude that the difference in the styles only has to do with the different philosophical thoughts swimming about in the minds of two people who are doing the same mathematics.

---------

As to an interpretation of probability in terms of observed frequencies, mathematically it can only remain an intuitive notion. The attempt to use probability to say something definite about an observed frequency is self-contradictory except in the trivial case where you assign a particular frequency a probability of 1, or of zero. For example, it would be satisfying to say "In 100 tosses of a fair coin, at least 3 tosses will be heads". That type of statement is an absolute guaranteed connection between a probabilty and an observed frequency. However, the theorems of probability theory provide no such guaranteed connections. The theorems of probability tell us about the probability of frequencies. The best we can get in absolute guarantees are theorems with conclusions like ##lim_{n \rightarrow \infty} Pr( E(n)) = 1 ##. Then we must interpret what such a limit means. Poetically, we can say "At infinity the event ##E(\infty)## is guaranteed to happen". But such a verbal interpretation is mathematically imprecise and, in applications, the concept of an event "at infinity" may or may not make sense.

As a question in physics, we can ask whether there exists a property of situations called probability that is independent of different observers - to the extent that if different people perform the same experiment to test a situation, they (probably) will get (approximately) the same estimate for the probability in question if they collect enough data. If we take the view that we live in a universe where scientists have at least average luck, we can replace the qualifying adjective "probably" with "certainly" and if we idealize "enough data" to be"an infinite amount of data", we can change "approximately" to "exactly". Such thinking is permitted in physics. I think the concept is called "physical probability".

My guess is that most people who do quantum physics believe in physical probability. Prominent Bayesians like de Finetti explicitly reject the existence of such objective probabilities. I haven't researched prominent Frequentists. I don't even know who they are yet, so I don't know if any of them assert physical probabilities are real. The point of mentioning this is that, yes, there is detail involved in explaining the difference between "objective" and "subjective" probability. However, as pointed out above, explaining all this detail does not, by itself, explain the practical distinction between the styles of Bayesian vs Frequentist probability modeling.

In fact, the cause-and-effect relation between a persons metaphysical opinions and their style of probability modeling is, to me, unclear. Historically, how did the connection between the metaphysics of Bayesians and the probability modeling style of Bayesians evolve? Did one preceed the other? Were there people who held Frequentist philosophical beliefs but began using the Bayesian style of probability modeling?

[Just found this: The article https://projecteuclid.org/download/pdf_1/euclid.ba/1340371071 indicates that a Bayesian style of probability modeling existed before the philosophical elaboration of subjective probability. It was called using "inverse probability".]

Last edited:
gentzen
Good one Dale. I am a frequentist myself. However as you pointed out its real basis is the Kolmogorov axioms. The frequentist view is 'intuitive', based on the strong law of large numbers, but has 'logical' issues. The Bayesian view has no logical issues, but is not what is usually used in many applied areas. It's a bit like calculus - real analysis is its correct basis, but in many applied areas you think of dx and dy as so small it is for many practical purposes zero, and certainly (dx)^2 and (dy)^2 can be neglected. Once you look on it that way it is simply a matter of choosing how you view it, depending on what the problem is and how you attack solving it.

As a bit further reading people might like to look in the Cox Axioms:
https://en.wikipedia.org/wiki/Cox's_theorem

Thanks
Bill

Last edited:
Kolmo and Dale
(After looking at the paper by Feinberg https://projecteuclid.org/download/pdf_1/euclid.ba/1340371071 ) here is a simple way to define the practical difference between Frequentist and Bayesian styles of probability models.

Begin with a concise definition (from https://en.wikipedia.org/wiki/Inverse_probability, which references the Feinberg paper):

In probability theory, inverse probability is an obsolete term for the probability distribution of an unobserved variable.

For example, suppose we model 10 tosses a possibly unfair coin as a random variable with binomial distribution with probability ##p## of the coin landing heads. Then the observed data is the 10 results of tossing the coin. The parameter ##p## is not observed. (We can say the effects of ##p## are observed, but the value of ##p## itself is not directly observed.) If we assume a probability model where ##p## is assumed to have a uniform distribution on the interval [0,1] then we have assigned a probability distribution to an unobserved variable, so we are using inverse probability.

Using "inverse probability" is now what we would call assigning a prior distribution to a parameter. The modern terminology "prior distribution" does not emphasize the fact that it is a distribution for a quantity that is not directly observed in the data.

The practical distinction between Frequentists and Bayesians is: Frequentists reject the use of inverse probability and Bayesians employ it.

The correct description of the history of probability and statistics is not that the earliest methods were Frequentist methods and that Bayesian methods were an innovation that came later. Instead, the earliest methods included using "inverse probability"

Frequentism developed in the 1920's when prominent statisticians rejected the use of "inverse probability". I haven't researched why they rejected using inverse probability - whether their reasons were metaphysical or practical - or unique to each individual Frequentist.

The Fequentist style of statistics became the dominant style for decades. (It's an interesting question why this happened - perhaps because Frequentist probabiity models have a simpler structure. They minimize the number of proability distributions involved.)

Bayesian methods were recognized as a distinct style of probability modeling when statisticians began to revive the use of "inverse probability".

Describing the practical difference between Bayesian and Frequentist styles in terms of "inverse probability" is a correct explanation, but it does not delve into the consequences of the decision to use or not to use "inverse probability".

The consequences of rejecting "inverse probability" are usually that we get a probability model can only be used to answer questions of the form "Assuming such-and-such, what is the probability of the data?". Allowing the use of inverse probability can create probability models that answer questions of the form "Given the data, what is the probability of such-and-such?"

Explaining the consequences of using or not using "inverse probability" is a technical matter and requires a technical article. Explaining the practical difference between Bayesian and Frequentist styles in terms of the definition of "inverse probability" can be done without many technical details and starts the reader off on the right foot.

bhobba and atyy
Nice piece, like the thought that Bayesian statistics are more 'fundamental' relying only on Kolomgorov axioms whereas a frequentist view leans to hard on the law of large numbers - which in any case would be irrelevant for distributions without finite moments - how does a frequentist distribution work for, say, a pareto distribution with α <1?

the frequentist appeal is as much of an abstraction as Bayesian. ISTM Bayes is just more honest about probability being a measure of ignorance. In reality, if you really did study a coin flip, roulette wheel or any other macroscopic system enough you could gain knowledge that moves the odds beyond 1/n, as after all 1/n is just a statement of ignorance of the relevant parameters of a deterministic system

bhobba
ISTM Bayes is just more honest about probability being a measure of ignorance.
I think for me that was the big “aha” moment: when I realized that probability and randomness were different things. It doesn’t matter what ##P(A)## represents operationally, if it follows the Kolomgorov axioms then it is a probability. It could represent true randomness, it could represent ignorance, it could represent uncertainty, and I am sure that there are other things it could represent.

I tend to like the idea of uncertainty more than randomness, because I find randomness a lot harder to pin down. It seems to get jumbled up with determinism and other things that you don’t have to worry about for uncertainty.

jasonRF, bhobba and Klystron
There are many things that satisfy probability axioms and yet seem to have nothing to do with probability. Here is an example: Consider ##N## free classical particles, each with energy ##E_i##, ##i=1,...,N##. Then the quantity
$$p_i=\frac{E_i}{\sum_{j=1}^N E_j}$$
satisfies the probability axioms. @Dale any comments?

There are many things that satisfy probability axioms and yet seem to have nothing to do with probability. Here is an example: Consider ##N## free classical particles, each with energy ##E_i##, ##i=1,...,N##. Then the quantity
$$p_i=\frac{E_i}{\sum_{j=1}^N E_j}$$
satisfies the probability axioms. @Dale any comments?
That one isn’t particularly exotic. It is a simple “balls in an urn” probability but weighted by energy rather than being equally weighted.

However, I am sure that there are other measures that are more surprising or genuinely exotic. The thing is to realize that probability is not about randomness. If something satisfies the axioms then it is a probability even if there is no sense of randomness or uncertainty involved.

Bayes theorem and all of the other theorems of probability would apply. Whether they would be useful is a separate question, but they would surely apply.

The thing is to realize that probability is not about randomness. If something satisfies the axioms then it is a probability even if there is no sense of randomness or uncertainty involved.
But what is probability then about? About anything that satisfies the axioms of probability? My view is that, if a set of axioms does not really capture the concept that people originally had in mind before proposing the axioms, then it is the axioms, not the concept, that needs to be changed.

But what is probability then about? About anything that satisfies the axioms of probability?
Yes. That is what axiomatization does. It abstracts a concept. Then the word “probability” (in that mathematical and axiomatic sense) itself becomes an abstraction representing anything which satisfies the axioms.

My view is that, if a set of axioms does not really capture the concept that people originally had in mind before proposing the axioms, then it is the axioms, not the concept, that needs to be changed.
I do sympathize with that view, but realistically it is too late in this case. The Kolomgorov axioms are already useful and well accepted, and using the word “probability” to refer to measures which satisfy those axioms is firmly established in the literature.

The best you can do is to recognize that the word “probability”, like so many other words, has multiple meanings. One is the mathematical meaning of anything which satisfies Kolomgorov’s axioms, and the other is the “concept that people originally had in mind”. Then you merely make sure that it is understood which meaning is being used, as you do with any other multiple-meaning word.

BWV and Demystifier
I tend to like the idea of uncertainty more than randomness, because I find randomness a lot harder to pin down. It seems to get jumbled up with determinism and other things that you don’t have to worry about for uncertainty.

But if a Bayesian draws samples from a distribution, then wouldn't the Bayesian be using the idea of randomness?

Eg.
https://en.wikipedia.org/wiki/Gibbs_sampling
http://www.mit.edu/~ilkery/papers/GibbsSampling.pdf

But if a Bayesian draws samples from a distribution, then wouldn't the Bayesian be using the idea of randomness?
Not necessarily. We are certainly uncertain about random things, but we are also uncertain about some non-random things. Both can be represented as a distribution from which we can draw samples. So the mere act of drawing from a distribution does not imply randomness.

A good example is a pseudorandom number generator. There is nothing actually random about it. But we are uncertain of its next value, so we can describe it using a distribution and draw samples from it.

But what is probability then about? About anything that satisfies the axioms of probability? My view is that, if a set of axioms does not really capture the concept that people originally had in mind before proposing the axioms, then it is the axioms, not the concept, that needs to be changed.

It's fair to say that the concept of probability that people originally had in mind involves a situation where there are several "possible" outcomes of some physical phenomena, but only one of the "possible" outcomes "actually" occurs. The concept of probability associated with such a situation involves a "tendency" for certain outcomes to actually happen that can be measured by a number, but the lack of any absolute guarantee that this number will correspond to the observed frequencies of the outcomes that actually do happen. This is still how many people applying probability theory think of probability.

However, such thoughts involve the complicated metaphysical concepts of "possible" as distinct from "actual". There is not yet any ( well known) system of mathematics that formalizes these metaphysical concepts and also provides anything useful for applications that the Kolmogorov approach doesn't already supply.

The Kolomogorov approach ( measure theory) provides a reliable basis for proving theorems about probabilities. The price of this approach is that probability theory is essentially circular. We have theorems that say if certain probabilities are such-and-such then the probabilities of other things are so-and-so. Any interpretation of probability theory as a guarantee of what will actually happen is outside this theory. It falls under whatever field of science deals with the problem to which the theory is applied.

It seems to me that in physics there is a long tradition of attempts to formulate theories of probability on the basis of actual frequencies of outcomes. For example, if we consider tossing a fair coin as a physical event, then such a theory would tell us to consider the "ensemble" of tossed coins. The ensemble must be an actual thing. It may involve all fair coins that have been tossed in past and all that will be tossed in the future, and coins tossed on other planets etc. In this actual ensemble of fair coins there is an actual frequency that have (or will) land heads. So this frequency is a specific number if the ensemble is finite. (If the ensemble isn't finite, we have more conceptual work to do.)

These ensemble theories do not explain taking independent samples from the ensemble unless we add further structure to theory. (For example, why won't the sub-ensemble corresponding to one experimenter's tosses all come out heads?) So we need the ensemble to be distributed in space and time (e.g. among various labs and among various times-of-day) in some way that mimics the appearance of independent trials.

Demystifier
If I may offer a suggestion, or maybe you can reply here, on the two different interpretations of probabilistic statements such as :" There is a 60% chance of rain for (e.g.) Thursday." In frequentist perspective, I believe this means that in previous times with a similar combination of conditions as the ones before Thursday, it rained 60% of the time. I have trouble finding a Bayesian interpretation for this claim. You may have a prior, but I can't see what data you would use to update it to a posterior probability.
It means that based on the known distribution parameters and a model of how those parameters affect weather, that there is 60% chance of rain on Thursday. Those parameters include all the things a meteorologist might use to predict the weather. How the model is determined, I'm not quite sure. The model may itself be encoded by additional distribution parameters, which are updated according to observations. The Expectation-Maximisation method is all about determining unknown distribution parameters.