I Bayesian statistics in science

PeterDonis · Nov 9, 2021

gentzen said:

the message of those paradoxes

From the paper you reference, it seems to me that the key issue is that the group measure for a non-compact group is not normalizable. A simple example given in the paper is that, if the group in question is the reals--for example, if we think the problem is invariant under translation along one direction, regardless of the size of the translation--then the appropriate measure is Lebesgue measure, which is not normalizable; the total measure over the reals is infinite.

However, I'm not sure any real problem actually requires the full range of a non-compact group. In the simple example just described, any real problem will not be invariant under translation by any distance whatsoever. It will only be invariant under translation over some bounded region. So I ought to be able to find some compact group with a normalizable measure that represents the actual invariance and use that instead.

A. Neumaier · Nov 9, 2021

PeterDonis said:

As I have said before, Jaynes's aim in his book is to give objective (by your definition) procedures for assigning prior probabilities.

On p.373 Jaynes makes the same claim, with the same definition of objective:

Jaynes said:

In our view, problems of inference are ill-posed until we recognize three essential things.
(A) The prior probabilities represent our prior information, and are to be determined, not by introspection, but by logical analysis of that information.
(B) Since the final conclusions depend necessarily on both the prior information and the data, it follows that, in formulating a problem, one must specify the prior information to be used just as fully as one specifies the data.
(C) Our goal is that inferences are to be completely ‘objective’ in the sense that two persons with the same prior information must assign the same prior probabilities.

But he does not redeem his promise. The point is that if the prior information is objective, it is not given by a prior probability distribution, since prior information X are concepts and numbers, not distributions. Thus (A) is not a fact but wishful thinking. There is a subjective step involved in converting the prior information into a prior distribution, which makes (C) also wishful thinking.

Once the prior distribution is specified, the posterior is objectively determined by it and the rules. But whereas in the passage I had cited earlier, Jaynes distinguished between the prior information X and the prior distribution P(A|X), he now identifies them, contradicting himself. Indeed, X and P(A|X) are mathematically two very distinct items. To know X says nothing at all about P(A|X).

In our example of quantum tomography, X is 'the Hilbert space of two qubits is ##C^2\otimes C^2##, while P(A|X) is a distribution of 4x4 density matrices. Jaynes says nothing at all about how one objectively deduces this probability distribution from X. He only gives plausibility arguments for a few elementary sample cases, primarily group invariance considerations. Invariance suggests a complex Wishart distribution as sensible prior, but there is a 17-dimensional family of these, and none of them has any merit of being distinguished. Even if one opts for simplicity and sets the scale matrix to the identity (which already adds information not in the prior information), another parameter ##n>3## remains to be chosen that has no natural default value. Thus different subjects would most likely pick different priors to represent the same prior information X. This makes the choice of the prior subjective given only the prior information X.

A. Neumaier · Nov 9, 2021

PeterDonis said:

For any system on which we could do quantum tomography, won't there be one unique finite-dimensional Hilbert space? For example, if I have two qubits, possibly entangled (and I want to use quantum tomography to determine whether they are entangled), isn't the Hilbert space just ##\mathbb{C}^2 \times \mathbb{C}^2##?

If you regard your system as two qubits, this determines the Hilbert space, because a qubit is a mathematical abstraction. But real experiments are made with beams of light, and there are choices of how you model the system. Even if you ignore polarization and the fact that a beam is never infinitely thin (which strictly speaking makes a photon state a function of momentum), the photon Hilbert space is still an infinite-dimensional space for harmonic oscillators of different frequency (so that you can consider squeezed states and parametric down-conversion). This must be truncated by idealization to a finite-dimensional space. In quantum state tomography one would typically assume the frequency to be fixed and the intensity of the beam to be low enough so that only a few basis states need to be considered. But if you want to measure the Wigner function, you need many more excited states.

gentzen · Nov 9, 2021

PeterDonis said:

From the paper you reference, it seems to me that the key issue is that the group measure for a non-compact group is not normalizable.

Well, being non-amenable is worse than just being non-compact. The group measure is more than just not normalizable, it is also not well approximable by normalizable measures in the appropriate sense.

PeterDonis said:

However, I'm not sure any real problem actually requires the full range of a non-compact group. ... So I ought to be able to find some compact group with a normalizable measure that represents the actual invariance and use that instead.

If I interpret your idea here as approximating a non-compact group by a compact group, then being non-amenable will have the effect that you cannot approximate the group by compact groups in the appropriate sense.

A. Neumaier · Nov 9, 2021

PeterDonis said:

any real problem will not be invariant under translation by any distance whatsoever. It will only be invariant under translation over some bounded region. So I ought to be able to find some compact group with a normalizable measure that represents the actual invariance and use that instead.

This group consists of a single element, the identity. (Translations ober a bounded region are only partially defined and do not form a group.) But to lead to a noninformative prior the group must at least be transitive of the set on which the probability distribution is sought.

Nontrivial group invariance is a rare property in real applications.

PeterDonis · Nov 9, 2021

gentzen said:

If I interpret your idea here as approximating a non-compact group by a compact group

Not approximating, no, just replacing one with the other based on a better specification of the actual invariance of the problem. But in view of what @A. Neumaier says in post #65, the resulting structure might not be a group and might not have all of the required properties.

Fra · Nov 9, 2021

I enjoy this 4th philosophical viewpoints of objective bayesian, as Berger puts it:

"Objective Bayesian analysis is simply a collection of ad hoc but useful methodologies for learning from data"
-- https://www2.stat.duke.edu/~berger/papers/obayes-debate.pdf

This for me paints a picture of the level of ambition of explanatory power and thus the problem of the objective approach. For those that prefer subjective coherence over objective "ad hoc", may prefer the more powerful dark side even if dangerous :nb)

/Fredrik

WernerQH · Nov 10, 2021

gentzen said:

Jaynes may have enjoyed opposing the whole establishment, but that doesn't resolve the paradoxes

Thanks for the references. I can't say I find them more convincing than Jaynes 's exposition of the marginalization paradox. In view of several decades of debates it seems unlikely that I'll be able to understand your reservations about objective Bayesianism. Do you know of a real-world problem where difficulties of this kind have turned up? (Jaynes's discussion of Bertrand's problem satisfied me. But I'm just a physicist. ;-))

A. Neumaier · Nov 10, 2021

WernerQH said:

Do you know of a real-world problem where difficulties of this kind have turned up?

Most likely, there cannot be any. The reason is that real world applications usually do not use Bayesian methods, as the latter are restricted to low dimensional problems.

The only exception are models based on exponential families, where conjugate priors can be easily specified and updated since all estimation boils down to updating the ordinary sample mean of a sufficient statistics. The Bayesian estimate is derived from the latter. This is equivalent to regularized frequentist statistics based on exponential families. Thus nothing is gained through Bayesian methods compared to frequentist ones.

gentzen · Nov 10, 2021

WernerQH said:

In view of several decades of debates it seems unlikely that I'll be able to understand your reservations about objective Bayesianism. Do you know of a real-world problem where difficulties of this kind have turned up? (Jaynes's discussion of Bertrand's problem satisfied me. But I'm just a physicist. ;-))

My reservations and the "difficulties of this kind" are two separate topics. My reservations are about Jaynes' book, and about the unrealistic expectations it creates. Just like you, I am not an expert on Bayesianism and the several decades of debates. I gave an explicit example of how those unrealistic expectations play out in the real-world before:

gentzen said:

..., you can somehow magically encode your background knowledge into a prior (which is a sort of not necessarily normalisable probability distribution), add some observed facts, and then get the probability for a given proposition (given your prior and your observations) as a result.

Of course, this is a caricature version of the Bayesian interpretation, but people do use it that way. And they use it with the intention to convince other people. So what strikes me as misguided is not when people like Scott Aaronson use Bayesian arguments in addition to more conventional arguments to convince other people, but when they replace perfectly fine arguments by a supposedly superior Bayesian argument and exclaim: "This post supersedes my 2006 post on the same topic, which I hereby retire." For me, this is related to the philosophy of Cox's theorem that a single number is preferable over multiple independent numbers (https://philosophy.stackexchange.co...an-reasoning-related-to-the-scientific-method). ...

Other people seem to share my reservations:

As the saying goes, the problem with Bayes is the Bayesians. It’s the whole religion thing, the people who say that Bayesian reasoning is just rational thinking, or that rational thinking is necessarily Bayesian, the people who refuse to check their models because subjectivity, the people who try to talk you into using a “reference prior” because objectivity. Bayesian inference is a tool. It solves some problems but not all, and I’m exhausted by the ideology of the Bayes-evangelists.

The "difficulties of this kind" on the other hand is more a gut feeling (or an educated guess) on my part, as opposed to solid knowledge. I also have to "fight" to understand that stuff. You are no exception here. Other people mention "high dimensions" when talking about the Hidden dangers of noninformative priors:

And, when you increase the dimensionality of a problem, both these things happen: data per parameter become more sparse, and priors distribution that are innocuous in low dimensions become strong and highly informative (sometimes in a bad way) in high dimensions.

But my gut feeling expects something worse than just insufficient data per parameter. Something like

Here we show that, in general, the prior remains important even in the limit of an infinite number of measurements. We illustrate this point with several examples where two priors lead to very different conclusions given the same measurement data.

from an abstract of Christopher Fuchs and Ruediger Schacks. (I didn't read their paper yet, even so it is short. But I saw their abstract a long time ago, and it did influence my gut feeling.) Basically, I expect a fundamental limitation of achievable accuracy. And I expect that this enables you to include some preferred properties in your model, for example that there exists only a single world.

WernerQH · Nov 10, 2021

gentzen said:

Basically, I expect a fundamental limitation of achievable accuracy.

What does "accuracy" mean here? Do you think there can be some ultimate truth concerning the value of a probability? Even with dice, the value ## \frac 1 6 ## can be "exact" as a prior, but in the real world it can only be "approximate", valid as long as it agrees with observations.

gentzen said:

And I expect that this enables you to include some preferred properties in your model, for example that there exists only a single world.

That there exists only a single world agrees with my gut feeling. :-)

Fra · Nov 10, 2021

This whole search of some logical justification of which method if inference that is "optimal", as in as that would be a mathematical or logical problem for a "general inference problem" seems to me like a misguided mess. All the references here reminds my of how different this is from how I prefer to think of this, and that there are a couple of parallell and intermixed issues here, that for ME is entangled, but which for some are independent.

(1) The quest for the "optimal" mathematical theory of making quantiative inference from quantiative evidence, this involes defining measures and update rules. And the quest for pure logical argument for which is the best one.

(2) The quest for the physical interactions nature, between it's parts, this involes the physics and measures of matter and how it evolves, relative to other parts, and the quest for natural explanations for natures choices.

(2) is my interest, but cast in the form of (1), and the question is which one is best fit for physics? I see that some of the references are mainly about (1) intermixed with philosophy, in a sense that has little to do with foundations of physics, or with physical constraints on inferences made by physical observers, and how this relates to BH information paradoxes, quantum weirdness etc. IF one forgets this perspective, and mixes arguments from the casre of pure (1), I think there will be misunderstnadings due to the different goals of discussions. This is why I picture the "agent" and a physical version of the "prior information", and that the physical basis of "information" is in the mictrostructure of matter.

/Fredrik

Fra · Nov 11, 2021

AlexCaledin said:

- do you mean, in the quantum state of the universe?

I mentioned it just as to illustrate that discussions here regarding the foundations of science and probability may be discussed in itself, within the realm of logic or mathematics, or in relation to foundations of physics (and then i specifically mean the foundations of physical law). The goals and issues are different. Philosophers of mathematics and philosopers of physics are related by may still have different goals.

(The way I happe to think of it, by "information" in this case, i mean what the agent knows about the rest of the universe, this information or more than just the "quantum state", or the "prior probability", it contains also the "prior information" the defines say hilbert spaces or probability spaces etc. But as the agent is a subsystem, this can not be compared to the "quantum state of the whole" in the sense of Wheeler-deWitt, which is more a non-physical fiction. This fiction is thus rejected in my thinking. Not on grounds of mathematics but on other admittedly murky grounds. But as one ties mathematics ot reality, it's bound to get murky somewhere. But these arguments are usually beyond what you see in many math/logic discussions, which is my point)

/Fredrik

Fra · Nov 11, 2021

gentzen said:

But my gut feeling expects something worse than just insufficient data per parameter. Something like

from an abstract of Christopher Fuchs and Ruediger Schacks. (I didn't read their paper yet, even so it is short. But I saw their abstract a long time ago, and it did influence my gut feeling.) Basically, I expect a fundamental limitation of achievable accuracy. And I expect that this enables you to include some preferred properties in your model, for example that there exists only a single world.

If one has some dream about finding and optimal objective learning algorithm that is guaranteed to find the truth, this may be a "problem", but that seems like a fantasy.

But if in instead (that i was suggesting) is trying to just predict the dynamics of a system of interacting agents, that this is not a problem; it's a trait that can help explain non trivial selforganisation. If all agents, ultimately converge to the same thing, its seems one would only be able to find decay kind of phenomena there, similary to entropic flows. With multiple attractors, we will get more interesting phenomenology.

/Fredrik

Fra · Nov 11, 2021

Related abstractions exists for models of the brain and social interactions.

For example the "bayesian brain hypothesis"

"This is the first crucial point in understanding the Bayesian brain hypothesis. It is a profound point: the internal model of the world within the brain suggests that processes in the brain model processes in the physical world. In order to successfully predict the future, the brain needs to run simulations of the world on its own hardware. These processes need to follow a causality similar to that of the external world, and a world of its own comes alive in the brain observing it."
-- https://towardsdatascience.com/the-bayesian-brain-hypothesis-35b98847d331

Unless one takes is too litteraly(!), and understandn that there are differences, this is similar to the agent view, except - think of matter om gemeral - instead of brains. It is an intuitive way to understand the concepts.

Edit: Also an arxiv ref would have been better i suppose. One related us here.. they label it predictive coding.. which is conceptually the way prior information are "coded"...

https://arxiv.org/abs/2107.12979

/Fredrik

AlexCaledin · Nov 12, 2021

Fra said:

as the agent is a subsystem, this can not be compared to the "quantum state of the whole" in the sense of Wheeler-deWitt, which is more a non-physical fiction

- very well, suppose you make the whole universe consisting of those agent subsystems - and then, they request the universe quantum state to correlate and "objectively" record all their observations, because otherwise it's not fun. ( - Just like two fellows having nothing to do and asking for a chessboard)

Have you read Enrico Fermi's assertion, that the description of reality ought to be dualistic, physical + mental? Sorry, I forgot where to find it.

Fra · Nov 13, 2021

AlexCaledin said:

- very well, suppose you make the whole universe consisting of those agent subsystems - and then, they request the universe quantum state to correlate and "objectively" record all their observations, because otherwise it's not fun. ( - Just like two fellows having nothing to do and asking for a chessboard)

Have you read Enrico Fermi's assertion, that the description of reality ought to be dualistic, physical + mental? Sorry, I forgot where to find it.

I take it you mean, "not fun" = an inconsisteny between views?

My counter questions would then be: Record where? Also note that observerations has to be "communicated" as well; exactly wo where is this communicated?

The argument is that the inconsistency is not a physical one. It's a logical inconsistency which just means that it's the existence of a logically motivated objectivity, that is inconsistent. The inconsistency is arrived at an fictional level, and thus iis not actually a problem excepct for their own way of thinking!

No I haven't read Fermi's description of that. The word "mental" rejects me though

So I am guess I didn't miss anything.

The analogy with a learning brain can be a loose source of insight into abstractions only, but that's where it ends. I am not suggesting that matter has "human properties", I am rather saying the opposite, that even human brains follow the same physics as anything else. What's impressive about human brain lies not in some divine religious dimensions(except there are those who may thing so, but this is not even close to what i am talking about), but soley in it's complexity, and it's exactly how effective laws evolve at different layers in a complex system that we apparently are far from understanding and beeing able to "capture" in models. This is also why laws at complexity levels orders of magnitudes apart appear "unrelated.

/Fredrik

I Bayesian statistics in science

Similar threads

I Question about discussions around quantum interpretations

I Does Time-Symmetry Imply Retrocausality? How does the Quantum World Say “Maybe”?

I Physicists disagree wildly on what quantum mechanics says about real…

I Carroll interviews Barandes on Indivisible Stochastic QM

A Causality in QFT

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers