I Bayesian statistics in science

  • #51
A. Neumaier said:
Descriptive probability is objective whenever the way how one arrives at the probabilities given the data are made explicit.
Of course. I'm not interested in mincing words. I think also Jaynes would agree to this. He put a lot of effort in clarifying how we arrive at probabilities. What makes you so fussy calling his probabilities "subjective"? I find the frequentist view of probabilities far too narrow, but I suspect nothing will convince you.
 
Physics news on Phys.org
  • #52
WernerQH said:
What makes you so fussy calling his probabilities "subjective"? I find the frequentist view of probabilities far too narrow, but I suspect nothing will convince you.
Indeed, the frequentist view of probabilities risks being too narrow. Part of the rehabilitation is to make it a bit less narrow, but not too much. In fact, this is not too different from what you wrote yourself above:
WernerQH said:
... whether probabilities are objective or subjective ... By necessity they are both, because probabilities are the glue that connects our theories to the real world.

Saying that the velocities of the molecules in a gas are subject to a Maxwellian distribution is surely a probabilistic statement. If physics is not to turn into a meaningless game, the temperature of a gas and probabilities must have objective meaning.

On the other hand, Bayesian views of probability risk being too broad. And Jaynes risks to devalue even his own achievements, because he does not want to face the fact that his objective Bayesian perspective cannot solve all problems and paradoxes that arise in connection with probabilities. His "solution" to denounce infinity in all its forms without taking proper care of appropriate limit concepts simply doesn't work.
 
  • #53
WernerQH said:
What makes you so fussy calling his probabilities "subjective"?
I label it such because it is his terminology:
Jaynes (p.44) said:
In the theory we are developing, any probability assignment is necessarily ‘subjective’ in the sense that it describes only a state of knowledge, and not anything that could be measured in a physical experiment. Inevitably, someone will demand to know: ‘Whose state of knowledge?’ The answer is always: ‘That of the robot – or of anyone else who is given the same information and reasons according to the desiderata used in our derivations in this chapter.’
Whatever is solely in the mind of a robot or a human (i.e., a state of knowledge) is subjective. Whereas if everything follows from the data by precisely spelled out rules it is objective.

Only if you completely specify the prior and how the posterior translates into numbers to be reported., the Bayesian probability becomes objective, though the choice of the prior and the recipe for extracting reportable numbers from the posterior are subjective acts. By specifying them others can check how you arrived at your results, and can criticize the whole procedure. In particular, the prior and the output recipe can in principle be falsified by sufficiently extensive subsequent experiments.
 
Last edited:
  • #54
WernerQH said:
He put a lot of effort in clarifying how we arrive at probabilities.
Unfortunately, his analysis is essentially never followed in practice, where frequentist methods (and hierarchical methods such as REML, which have both a frequentist and a Bayesian derivation) prevail. General Bayesian techniques are feasible only for low-dimensional data analysis, and hence is far removed from today's needs.

In other words, Jaynes put a lot of effort in clarifying how we should arrive at probabilities, but practitioners rarely heed his moral commands.
 
Last edited:
  • Like
Likes dextercioby and gentzen
  • #55
A. Neumaier said:
Whatever is solely in the mind of a robot or a human (i.e., a state of knowledge) is subjective.
As I have already pointed out, Jaynes's usage of "subjective" here is very different from yours. His answer to "whose state of knowledge" is not just "That of the robot": it is "‘That of the robot – or of anyone else who is given the same information and reasons according to the desiderata used in our derivations in this chapter." Which meets your definition of "objective":
A. Neumaier said:
Whereas if everything follows from the data by precisely spelled out rules it is objective.
 
Last edited:
  • #56
A. Neumaier said:
the choice of the prior and the recipe for extracting reportable numbers from the posterior are subjective acts.
As I have said before, Jaynes's aim in his book is to give objective (by your definition) procedures for assigning prior probabilities. (For the "extracting reportable numbers" part, I'm a bit confused, because as far as Jaynes is concerned, the posterior probabilities are the reportable numbers.)

A. Neumaier said:
By specifying them others can check how you arrived at your results, and can criticize the whole procedure. In particular, the prior and the output recipe can in principle be falsified by sufficiently extensive subsequent experiments.
Not only is this true, it is often the whole point of going through the process Jaynes describes. The results of the process Jaynes describes tell you what is implied by your current knowledge. Then you run actual experiments to see what happens. If actual experiments match what you got from Jaynes's process, it means your current knowledge is correct (as far as the experiments can tell), which is good to know but doesn't advance your knowledge any. If you see something different in actual experiments, that means your current knowledge is incomplete and you have an opportunity to come up with a better theoretical model. That's progress.
 
  • Like
Likes WernerQH
  • #57
PeterDonis said:
As I have already pointed out, Jaynes's usage of "subjective" here is very different from yours. His answer to "whose state of knowledge" is not just "That of the robot": it is "‘That of the robot – or of anyone else who is given the same information and reasons according to the desiderata used in our derivations in this chapter." Which meets your definition of "objective":
No. The prior and the rules for extracting numbers from the posterior are generally private to the robot; they are objective only if they are shared with anyone else. That's why he uses the term subjective - and in the same sense as I.
PeterDonis said:
as far as Jaynes is concerned, the posterior probabilities are the reportable numbers.
In publications one always reports a few numbers, not a posterior distribution - which is generally far too complex to report. Thus one needs a step to go from the distribution to these numbers.
 
  • #58
gentzen said:
... he does not want to face the fact that his objective Bayesian perspective cannot solve all problems and paradoxes that arise in connection with probabilities. His "solution" to denounce infinity in all its forms without taking proper care of appropriate limit concepts simply doesn't work.
Doesn't he address exactly this problem in appendix B.2 of his book?
As I understand him, he urges "taking proper care" of the limiting process.
 
  • #59
A. Neumaier said:
No. The prior and the rules for extracting numbers from the posterior are generally private to the robot; they are objective only if they are shared with anyone else. That's why he uses the term subjective - and in the same sense as I.
Once again you and I are reading Jaynes very differently. I don't see the point of belaboring it any further.

A. Neumaier said:
In publications one always reports a few numbers, not a posterior distribution - which is generally far too complex to report. Thus one needs a step to go from the distribution to these numbers.
The gist of your comment here and in other places is that actual practice in the scientific community does not match Jaynes's prescriptions. I am not disputing that. I am simply trying to correctly describe what Jaynes's prescriptions are, whether or not anyone else follows them.
 
  • #60
WernerQH said:
Doesn't he address exactly this problem in appendix B.2 of his book?
As I understand him, he urges "taking proper care" of the limiting process.
No, those words only expose his thoughts about how those paradoxes could be resolved. They fail to resolve them adequately. And he simply doesn't like the message of those paradoxes:
It remains to discuss the implications of this analysis for objective Bayesianism. There are strong indications that the requirement that an improper inference be a probability limit is very restrictive. In group models, Stone has shown that the formal posterior can only be a probability limit if the prior is right Haar measure and the group satisfies a technical condition, known as amenability [13]. Eaton and Sudderth have shown that many of the formal posteriors of multivariate analysis are “incoherent” or strongly inconsistent, and thus cannot be probability limits [16].
(quoted from the section "Discussion" of the link behind my "simply doesn't work.") The section "Stone's Example" shows what goes wrong for the improper prior ##\exp(a\theta)## as an example. It feels "too massive" as a prior to me, and I guess the Haar measure in a non-amenable group will also be "too massive" in a certain sense.

Jaynes may have enjoyed opposing the whole establishment, but that doesn't resolve the paradoxes:
On many technical issues we disagree strongly with de Finetti. It appears to us that his way of treating infinite sets has opened up a Pandora’s box of useless and unecessary paradoxes.” E.T. Jaynes, PT, p.xxi
... its extensive and exciting coverage of the marginalisation paradoxes which saw Jaynes opposing David, Stone, and Zidek (and even the whole Establishment, page 470), ...
There are no really trustworthy standards of rigor in a mathematics that embraced the theory of infinite sets.“ E.T. Jaynes, PT, p.xxvii
Except for the incomprehensible shots at formalised mathematics (Bourbakism), measure theory (as in Bertrand’s paradox) and Feller (...), I found the book quite pleasant and mostly in tune with my perception of Bayesian statistics (if strong on the militant side!). Jaynes did not think much of Bayes himself (an amateur!, on page 112), considering that Laplace had done much more to establish Bayesian-ism, and he clearly is a staunch supported of Jeffreys, if not of de Finetti.
 
  • #61
gentzen said:
the message of those paradoxes
From the paper you reference, it seems to me that the key issue is that the group measure for a non-compact group is not normalizable. A simple example given in the paper is that, if the group in question is the reals--for example, if we think the problem is invariant under translation along one direction, regardless of the size of the translation--then the appropriate measure is Lebesgue measure, which is not normalizable; the total measure over the reals is infinite.

However, I'm not sure any real problem actually requires the full range of a non-compact group. In the simple example just described, any real problem will not be invariant under translation by any distance whatsoever. It will only be invariant under translation over some bounded region. So I ought to be able to find some compact group with a normalizable measure that represents the actual invariance and use that instead.
 
  • #62
PeterDonis said:
As I have said before, Jaynes's aim in his book is to give objective (by your definition) procedures for assigning prior probabilities.
On p.373 Jaynes makes the same claim, with the same definition of objective:
Jaynes said:
In our view, problems of inference are ill-posed until we recognize three essential things.
(A) The prior probabilities represent our prior information, and are to be determined, not by introspection, but by logical analysis of that information.
(B) Since the final conclusions depend necessarily on both the prior information and the data, it follows that, in formulating a problem, one must specify the prior information to be used just as fully as one specifies the data.
(C) Our goal is that inferences are to be completely ‘objective’ in the sense that two persons with the same prior information must assign the same prior probabilities.
But he does not redeem his promise. The point is that if the prior information is objective, it is not given by a prior probability distribution, since prior information X are concepts and numbers, not distributions. Thus (A) is not a fact but wishful thinking. There is a subjective step involved in converting the prior information into a prior distribution, which makes (C) also wishful thinking.

Once the prior distribution is specified, the posterior is objectively determined by it and the rules. But whereas in the passage I had cited earlier, Jaynes distinguished between the prior information X and the prior distribution P(A|X), he now identifies them, contradicting himself. Indeed, X and P(A|X) are mathematically two very distinct items. To know X says nothing at all about P(A|X).

In our example of quantum tomography, X is 'the Hilbert space of two qubits is ##C^2\otimes C^2##, while P(A|X) is a distribution of 4x4 density matrices. Jaynes says nothing at all about how one objectively deduces this probability distribution from X. He only gives plausibility arguments for a few elementary sample cases, primarily group invariance considerations. Invariance suggests a complex Wishart distribution as sensible prior, but there is a 17-dimensional family of these, and none of them has any merit of being distinguished. Even if one opts for simplicity and sets the scale matrix to the identity (which already adds information not in the prior information), another parameter ##n>3## remains to be chosen that has no natural default value. Thus different subjects would most likely pick different priors to represent the same prior information X. This makes the choice of the prior subjective given only the prior information X.
 
Last edited:
  • #63
PeterDonis said:
For any system on which we could do quantum tomography, won't there be one unique finite-dimensional Hilbert space? For example, if I have two qubits, possibly entangled (and I want to use quantum tomography to determine whether they are entangled), isn't the Hilbert space just ##\mathbb{C}^2 \times \mathbb{C}^2##?
If you regard your system as two qubits, this determines the Hilbert space, because a qubit is a mathematical abstraction. But real experiments are made with beams of light, and there are choices of how you model the system. Even if you ignore polarization and the fact that a beam is never infinitely thin (which strictly speaking makes a photon state a function of momentum), the photon Hilbert space is still an infinite-dimensional space for harmonic oscillators of different frequency (so that you can consider squeezed states and parametric down-conversion). This must be truncated by idealization to a finite-dimensional space. In quantum state tomography one would typically assume the frequency to be fixed and the intensity of the beam to be low enough so that only a few basis states need to be considered. But if you want to measure the Wigner function, you need many more excited states.
 
  • #64
PeterDonis said:
From the paper you reference, it seems to me that the key issue is that the group measure for a non-compact group is not normalizable.
Well, being non-amenable is worse than just being non-compact. The group measure is more than just not normalizable, it is also not well approximable by normalizable measures in the appropriate sense.

PeterDonis said:
However, I'm not sure any real problem actually requires the full range of a non-compact group. ... So I ought to be able to find some compact group with a normalizable measure that represents the actual invariance and use that instead.
If I interpret your idea here as approximating a non-compact group by a compact group, then being non-amenable will have the effect that you cannot approximate the group by compact groups in the appropriate sense.
 
  • #65
PeterDonis said:
any real problem will not be invariant under translation by any distance whatsoever. It will only be invariant under translation over some bounded region. So I ought to be able to find some compact group with a normalizable measure that represents the actual invariance and use that instead.
This group consists of a single element, the identity. (Translations ober a bounded region are only partially defined and do not form a group.) But to lead to a noninformative prior the group must at least be transitive of the set on which the probability distribution is sought.

Nontrivial group invariance is a rare property in real applications.
 
  • #66
gentzen said:
If I interpret your idea here as approximating a non-compact group by a compact group
Not approximating, no, just replacing one with the other based on a better specification of the actual invariance of the problem. But in view of what @A. Neumaier says in post #65, the resulting structure might not be a group and might not have all of the required properties.
 
  • #67
I enjoy this 4th philosophical viewpoints of objective bayesian, as Berger puts it:

"Objective Bayesian analysis is simply a collection of ad hoc but useful methodologies for learning from data"
-- https://www2.stat.duke.edu/~berger/papers/obayes-debate.pdf

This for me paints a picture of the level of ambition of explanatory power and thus the problem of the objective approach. For those that prefer subjective coherence over objective "ad hoc", may prefer the more powerful dark side even if dangerous :nb)

/Fredrik
 
  • Like
Likes WernerQH
  • #68
gentzen said:
Jaynes may have enjoyed opposing the whole establishment, but that doesn't resolve the paradoxes
Thanks for the references. I can't say I find them more convincing than Jaynes 's exposition of the marginalization paradox. In view of several decades of debates it seems unlikely that I'll be able to understand your reservations about objective Bayesianism. Do you know of a real-world problem where difficulties of this kind have turned up? (Jaynes's discussion of Bertrand's problem satisfied me. But I'm just a physicist. ;-))
 
  • #69
WernerQH said:
Do you know of a real-world problem where difficulties of this kind have turned up?
Most likely, there cannot be any. The reason is that real world applications usually do not use Bayesian methods, as the latter are restricted to low dimensional problems.

The only exception are models based on exponential families, where conjugate priors can be easily specified and updated since all estimation boils down to updating the ordinary sample mean of a sufficient statistics. The Bayesian estimate is derived from the latter. This is equivalent to regularized frequentist statistics based on exponential families. Thus nothing is gained through Bayesian methods compared to frequentist ones.
 
Last edited:
  • Haha
Likes WernerQH
  • #70
WernerQH said:
In view of several decades of debates it seems unlikely that I'll be able to understand your reservations about objective Bayesianism. Do you know of a real-world problem where difficulties of this kind have turned up? (Jaynes's discussion of Bertrand's problem satisfied me. But I'm just a physicist. ;-))
My reservations and the "difficulties of this kind" are two separate topics. My reservations are about Jaynes' book, and about the unrealistic expectations it creates. Just like you, I am not an expert on Bayesianism and the several decades of debates. I gave an explicit example of how those unrealistic expectations play out in the real-world before:
gentzen said:
..., you can somehow magically encode your background knowledge into a prior (which is a sort of not necessarily normalisable probability distribution), add some observed facts, and then get the probability for a given proposition (given your prior and your observations) as a result.

Of course, this is a caricature version of the Bayesian interpretation, but people do use it that way. And they use it with the intention to convince other people. So what strikes me as misguided is not when people like Scott Aaronson use Bayesian arguments in addition to more conventional arguments to convince other people, but when they replace perfectly fine arguments by a supposedly superior Bayesian argument and exclaim: "This post supersedes my 2006 post on the same topic, which I hereby retire." For me, this is related to the philosophy of Cox's theorem that a single number is preferable over multiple independent numbers (https://philosophy.stackexchange.co...an-reasoning-related-to-the-scientific-method). ...
Other people seem to share my reservations:
As the saying goes, the problem with Bayes is the Bayesians. It’s the whole religion thing, the people who say that Bayesian reasoning is just rational thinking, or that rational thinking is necessarily Bayesian, the people who refuse to check their models because subjectivity, the people who try to talk you into using a “reference prior” because objectivity. Bayesian inference is a tool. It solves some problems but not all, and I’m exhausted by the ideology of the Bayes-evangelists.

The "difficulties of this kind" on the other hand is more a gut feeling (or an educated guess) on my part, as opposed to solid knowledge. I also have to "fight" to understand that stuff. You are no exception here. Other people mention "high dimensions" when talking about the Hidden dangers of noninformative priors:
And, when you increase the dimensionality of a problem, both these things happen: data per parameter become more sparse, and priors distribution that are innocuous in low dimensions become strong and highly informative (sometimes in a bad way) in high dimensions.
But my gut feeling expects something worse than just insufficient data per parameter. Something like
Here we show that, in general, the prior remains important even in the limit of an infinite number of measurements. We illustrate this point with several examples where two priors lead to very different conclusions given the same measurement data.
from an abstract of Christopher Fuchs and Ruediger Schacks. (I didn't read their paper yet, even so it is short. But I saw their abstract a long time ago, and it did influence my gut feeling.) Basically, I expect a fundamental limitation of achievable accuracy. And I expect that this enables you to include some preferred properties in your model, for example that there exists only a single world.
 
  • Like
Likes WernerQH
  • #71
gentzen said:
Basically, I expect a fundamental limitation of achievable accuracy.
What does "accuracy" mean here? Do you think there can be some ultimate truth concerning the value of a probability? Even with dice, the value ## \frac 1 6 ## can be "exact" as a prior, but in the real world it can only be "approximate", valid as long as it agrees with observations.
gentzen said:
And I expect that this enables you to include some preferred properties in your model, for example that there exists only a single world.
That there exists only a single world agrees with my gut feeling. :-)
 
  • #72
This whole search of some logical justification of which method if inference that is "optimal", as in as that would be a mathematical or logical problem for a "general inference problem" seems to me like a misguided mess. All the references here reminds my of how different this is from how I prefer to think of this, and that there are a couple of parallell and intermixed issues here, that for ME is entangled, but which for some are independent.

(1) The quest for the "optimal" mathematical theory of making quantiative inference from quantiative evidence, this involes defining measures and update rules. And the quest for pure logical argument for which is the best one.

(2) The quest for the physical interactions nature, between it's parts, this involes the physics and measures of matter and how it evolves, relative to other parts, and the quest for natural explanations for natures choices.

(2) is my interest, but cast in the form of (1), and the question is which one is best fit for physics? I see that some of the references are mainly about (1) intermixed with philosophy, in a sense that has little to do with foundations of physics, or with physical constraints on inferences made by physical observers, and how this relates to BH information paradoxes, quantum weirdness etc. IF one forgets this perspective, and mixes arguments from the casre of pure (1), I think there will be misunderstnadings due to the different goals of discussions. This is why I picture the "agent" and a physical version of the "prior information", and that the physical basis of "information" is in the mictrostructure of matter.

/Fredrik
 
  • #73
AlexCaledin said:
- do you mean, in the quantum state of the universe?
I mentioned it just as to illustrate that discussions here regarding the foundations of science and probability may be discussed in itself, within the realm of logic or mathematics, or in relation to foundations of physics (and then i specifically mean the foundations of physical law). The goals and issues are different. Philosophers of mathematics and philosopers of physics are related by may still have different goals.

(The way I happe to think of it, by "information" in this case, i mean what the agent knows about the rest of the universe, this information or more than just the "quantum state", or the "prior probability", it contains also the "prior information" the defines say hilbert spaces or probability spaces etc. But as the agent is a subsystem, this can not be compared to the "quantum state of the whole" in the sense of Wheeler-deWitt, which is more a non-physical fiction. This fiction is thus rejected in my thinking. Not on grounds of mathematics but on other admittedly murky grounds. But as one ties mathematics ot reality, it's bound to get murky somewhere. But these arguments are usually beyond what you see in many math/logic discussions, which is my point)

/Fredrik
 
  • #74
gentzen said:
But my gut feeling expects something worse than just insufficient data per parameter. Something like

from an abstract of Christopher Fuchs and Ruediger Schacks. (I didn't read their paper yet, even so it is short. But I saw their abstract a long time ago, and it did influence my gut feeling.) Basically, I expect a fundamental limitation of achievable accuracy. And I expect that this enables you to include some preferred properties in your model, for example that there exists only a single world.
If one has some dream about finding and optimal objective learning algorithm that is guaranteed to find the truth, this may be a "problem", but that seems like a fantasy.

But if in instead (that i was suggesting) is trying to just predict the dynamics of a system of interacting agents, that this is not a problem; it's a trait that can help explain non trivial selforganisation. If all agents, ultimately converge to the same thing, its seems one would only be able to find decay kind of phenomena there, similary to entropic flows. With multiple attractors, we will get more interesting phenomenology.

/Fredrik
 
  • #75
Related abstractions exists for models of the brain and social interactions.

For example the "bayesian brain hypothesis"

"This is the first crucial point in understanding the Bayesian brain hypothesis. It is a profound point: the internal model of the world within the brain suggests that processes in the brain model processes in the physical world. In order to successfully predict the future, the brain needs to run simulations of the world on its own hardware. These processes need to follow a causality similar to that of the external world, and a world of its own comes alive in the brain observing it."
-- https://towardsdatascience.com/the-bayesian-brain-hypothesis-35b98847d331

Unless one takes is too litteraly(!), and understandn that there are differences, this is similar to the agent view, except - think of matter om gemeral - instead of brains. It is an intuitive way to understand the concepts.

Edit: Also an arxiv ref would have been better i suppose. One related us here.. they label it predictive coding.. which is conceptually the way prior information are "coded"...

https://arxiv.org/abs/2107.12979

/Fredrik
 
Last edited:
  • #76
Fra said:
as the agent is a subsystem, this can not be compared to the "quantum state of the whole" in the sense of Wheeler-deWitt, which is more a non-physical fiction
- very well, suppose you make the whole universe consisting of those agent subsystems - and then, they request the universe quantum state to correlate and "objectively" record all their observations, because otherwise it's not fun. ( - Just like two fellows having nothing to do and asking for a chessboard)

Have you read Enrico Fermi's assertion, that the description of reality ought to be dualistic, physical + mental? Sorry, I forgot where to find it.
 
Last edited:
  • #77
AlexCaledin said:
- very well, suppose you make the whole universe consisting of those agent subsystems - and then, they request the universe quantum state to correlate and "objectively" record all their observations, because otherwise it's not fun. ( - Just like two fellows having nothing to do and asking for a chessboard)

Have you read Enrico Fermi's assertion, that the description of reality ought to be dualistic, physical + mental? Sorry, I forgot where to find it.
I take it you mean, "not fun" = an inconsisteny between views?

My counter questions would then be: Record where? Also note that observerations has to be "communicated" as well; exactly wo where is this communicated?

The argument is that the inconsistency is not a physical one. It's a logical inconsistency which just means that it's the existence of a logically motivated objectivity, that is inconsistent. The inconsistency is arrived at an fictional level, and thus iis not actually a problem excepct for their own way of thinking!

No I haven't read Fermi's description of that. The word "mental" rejects me though :H So I am guess I didn't miss anything.

The analogy with a learning brain can be a loose source of insight into abstractions only, but that's where it ends. I am not suggesting that matter has "human properties", I am rather saying the opposite, that even human brains follow the same physics as anything else. What's impressive about human brain lies not in some divine religious dimensions(except there are those who may thing so, but this is not even close to what i am talking about), but soley in it's complexity, and it's exactly how effective laws evolve at different layers in a complex system that we apparently are far from understanding and beeing able to "capture" in models. This is also why laws at complexity levels orders of magnitudes apart appear "unrelated.

/Fredrik
 

Similar threads

Back
Top