Bayesian statistics in science

Sunil · Nov 3, 2021

[Moderator's note: This thread has been split off from a previous thread since its topic is best addressed in a separate discussion. This post has been edited to focus on the topic for separate discussion.]

Jaynes has used in the derivation of the rules of probability as the logic of plausible reasoning in his "Probability Theory: The Logic of Science" the following trick: Instead of defining the rules for your own thinking, he has introduced a robot, some AI, and we have the job for defining the rules of his thinking. The trick is that if we think about a rules for a robot, we will care much more about the consistency of these rules. And the basic assumptions there are consistency rules: If there are several ways to derive something, the result should be the same. For our own reasoning, consistency is (intuitively) secondary.

The same type of reasoning we should apply here too. What should be the rules of physical reasoning for a robot designed to help physicists?

Fra · Nov 3, 2021

Sunil said:

If there are several ways to derive something, the result should be the same. For our own reasoning, consistency is (intuitively) secondary.

The same type of reasoning we should apply here too. What should be the rules of physical reasoning for a robot designed to help physicists?

Or an agent? Then ask what is the problem and consequence of agents making incompatible inferences? Then we are soon friends

/Fredrik

gentzen · Nov 3, 2021

Sunil said:

Jaynes has used ... the following trick: ... The trick is that if we think about a rules for a robot, we will care much more about the consistency of these rules. And the basic assumptions there are consistency rules: If there are several ways to derive something, the result should be the same. For our own reasoning, consistency is (intuitively) secondary.

The same type of reasoning we should apply here too. What should be the rules of physical reasoning for a robot designed to help physicists?

Are you aware of the content of the paper "Quantum mechanics via quantum tomography" that this thread is about? For example, it says in section "5.5 Objectivity":

The assignment of states to stationary sources is as objective as any assignment of properties to macroscopic objects. Thus the knowledge people talk about when referring to the meaning of a quantum state resides in what is encoded in (and hence ”known to”) the model used to describe a quantum system – not to any subjective mind content of a knower!

In particular, as quantum values of members of a quantum measure, all probabilities are objective frequentist probabilities in the sense employed everywhere in experimental physics – classical and quantum. That the probabilities are only approximately given by the relative frequencies simply says that – like all measurements – probability measurements are of limited accuracy only.

No robot, no agent, no "subjective mind content of a knower". The meaning of a quantum state resides in what is encoded in (and hence ”known to”) the model! This is almost exactly the opposite of what Jaynes would tell you.

PeterDonis · Nov 3, 2021

gentzen said:

The meaning of a quantum state resides in what is encoded in (and hence ”known to”) the model!

But isn't the model mind content?

gentzen said:

This is almost exactly the opposite of what Jaynes would tell you.

I think the passage you quote, or at least the interpretation you are giving it, is trading on an ambiguity in the word "subjective".

In fact, what it is describing is the same kind of thing as what Jaynes describes: the "robot" Jaynes describes builds a model of some system, and uses the model to compute probabilities. Those computations are perfectly objective: they are mathematical operations starting from precisely defined initial propositions, and the same operations applied to the same propositions will give the same answers every time.

The only "subjectivity" involved in Jaynes is that different robots in different states of knowledge--meaning, with different sets of data available to them--will have different models, and will therefore make different computations of probabilities because they are starting from different initial propositions. But that is equally true of experimenters doing quantum tomography: their model is built from the information they have obtained from their experiments, and two experimenters who have run different sets of experiments will have different models, and will therefore compute different probabilities. That is every bit as "subjective" as what Jaynes describes. But of course it's not "subjective" at all in the sense of people just arbitrarily choosing probabilities instead of computing them using specified operations from specified initial propositions--and neither is Jaynes.

A. Neumaier · Nov 3, 2021

PeterDonis said:

But isn't the model mind content?

If you call this 'mind content' then all physics and all language is mind content, and the phrase 'mind content' becomes meaningless since it comprises everything.

PeterDonis said:

I think the passage you quote, or at least the interpretation you are giving it, is trading on an ambiguity in the word "subjective".

gentzen's interpretation is exactly what I intended to convey.

PeterDonis said:

In fact, what it is describing is the same kind of thing as what Jaynes describes: the "robot" Jaynes describes builds a model of some system, and uses the model to compute probabilities.

This is not the standard meaning of 'model'. A model is a template in which the parameters are not fixed but to be determined by experiment. In the case of quantum tomography, the model is the Hilbert space chosen to model the quantum system - nothing else. The state is a matrix of parameters that are not determined by the model but by experiments, using the traditional objective, universally agreed statistical techniques. (Unless you think that the publications of the particle data group are not objective but subjective mind content. Then we do not need to discuss further.)

For example, a classical quartic oscillator is a model defined by a Hamiltonian $$H=p^2/2m+kq^2/2 +gq^2/4;$$ its coefficients (mass ##m## and coupling constant ##g##) are the parameters. The claim that a particular oscillator is well described by this within a given accuracy can be decided objectively by making experiments on the oscillator and comparing it with the predictions of the model.

If the model is correct to some accuracy, there will be a parameter setting that matches the prediction, and in the limit of arbitrarily many and arbitrarily accurate measurements, the parameters will be determined uniquely by the experiment. The only subjectivity is the choice of the ansatz for the Hamiltonian. This is the kind of subjectivity you have everywhere in physics. It has nothing to do with probabilities.

Whereas what you declare to be Jaynes' model is the parameters. The correctness of a Jaynes' model can be refuted by experiment unless the model is actually correct within the given accuracy. This can be established with a 5 sigma confidence if enough data are collected. In physics, this counts as objective.

PeterDonis said:

The only "subjectivity" involved in Jaynes is that different robots in different states of knowledge--meaning, with different sets of data available to them--will have different models, and will therefore make different computations of probabilities because they are starting from different initial propositions.

This makes Jaynes approach subjective in a way quantum tomography is not.

In quantum tomography, the state can be determined objectively independent of initial assumptions, by measuring long enough. There is no subjectivity in the parameters, if your parameters do not agree with the true parameters you'll sooner or later get a statistically arbitrarily significant discrepancy with experiments. Again, the only subjectivity is the choice of the model - in this case Hilbert space representing the quantum system. This is the kind of subjectivity you have everywhere in physics. It has nothing to do with probabilities.

PeterDonis · Nov 3, 2021

A. Neumaier said:

This is not the standard meaning of 'model'.

Sure it is. You're just using a different term for it than Jaynes normally uses. See below.

A. Neumaier said:

This makes Jaynes approach subjective in a way quantum tomography is not.

No, it means that in your approach, you have fixed the prior (in Bayesian terms):

A. Neumaier said:

In the case of quantum tomography, the model is the Hilbert space chosen to model the quantum system - nothing else.

Jaynes would agree with you that, having fixed this prior, any given set of experimental data will objectively lead to a unique computation of probabilities. The only possible difference between different people in this case is that they have different posterior data, in which case they might compute different posterior probabilities. But that is not a difference in models (for your definition of "model"); it's a difference in data.

In other words, what you mean by "model" is what Jaynes means by "prior".

A. Neumaier said:

Whereas what you declare to be Jaynes' model is the parameters.

I might not have been clear in my previous post because of the difference in your terminology vs. Jaynes'. Hopefully the above helps to clarify. I don't see any fundamental difference in your approach vs. Jaynes' approach, given that you have fixed the prior.

A question Jaynes might ask is why you have chosen that particular prior; the choice of prior is where the subjectivity enters in in Jaynes' view, but even on that view, one should still have some reasonable ground for choosing a particular prior. Given the role that Hilbert spaces are already understood to play in QM that question should not be hard for you to answer. (Although you, as the author of the thermal interpretation, might also want to explain why you chose the Hilbert space instead of the set of expectation values.)

A. Neumaier · Nov 3, 2021

PeterDonis said:

Jaynes would agree with you that, having fixed this prior, any given set of experimental data will objectively lead to a unique computation of probabilities.

No - neither he nor I would claim that. A given set of experimental data will never lead to a unique computation of probabilities. Different statistical techniques will give different results:

A simple frequentist estimator would be the relative frequency, which is not the probability but a deterministic (uniquely determined) estimate for it.
Jaynes would have to assume in addition to the data a prior for the probabilities (for example a Dirichlet prior) and then compute from data and prior combined a unique posterior for the probabilities. Because it depends on the prior the result is subjective.

But in quantum tomography the goal is not to obtain probabilities but to obtain the parameters of the model, in this case the density matrix. For this lots of different statistical procedures exist, all variants of the basic technique that I discuss in my paper. They produce different results - some more accurate than others, and increasing accuracy given the same data and a limited computational budget counts as scientific progress. The established scientific practice is that the computational technique used is specified together with the results, so that the procedure is objective, i.e., independent of undisclosed knowledge.

Jaynes would have to assume in addition to the data a prior specifying the probability for obtaining a givn density matrix, and then update this prior in the light of the data. This is a very inefficient way to proceed, especially when the Hilbert space is not only of toy dimensions. For a 10 qubit system, the Hilbert space has dimension 1024, the density matrix depends on more than a million variables, and the posterior would be an extremely complicated probability distribution in dimension of more than a million. Huge overkill!

PeterDonis said:

The only possible difference between different people in this case is that they have different posterior data, in which case they might compute different posterior probabilities. But that is not a difference in models (for your definition of "model"); it's a difference in data.

In other words, what you mean by "model" is what Jaynes means by "prior".

?
Jaynes's model is a probability distribution on states, initially the prior.

My model is the Hilbert space. How can it be considered to be a prior??

PeterDonis said:

given that you have fixed the prior.

? I don't have a prior. I have a model (a Hilbert space) and a huge, arbitrarily extensible collection of data. The latter determines (prior-independent) the parameters that are unspecified in the model to an accuracy determined by the data.

PeterDonis said:

A question Jaynes might ask is why you have chosen that particular prior;

Which prior did I choose? When and where?

PeterDonis said:

explain why you chose the Hilbert space instead of the set of expectation values.

The Hilbert space may be, for example, the tensor product of two two-dimensional Hilbert spaces. This Hilbert space enables one to discuss beams of two entangled photons in Bell experiments. This is the model. It models all possible beams of two entangled photons.

To find out which of this continuum of possibilities is actually realized you need to know in addition to the model its state. This knowledge is obtained by quantum tomography. It is objectively determined to some accuracy by sufficiently extensive data. There are many ways to extract this objective knowledge from the data.

Jaynes' Bayesian methods (which would describe the uncertainty remaining in terms of a probability distribution on density matrices) are not among the most used techniques to do this.

PeterDonis · Nov 3, 2021

A. Neumaier said:

in quantum tomography the goal is not to obtain probabilities but to obtain the parameters of the model, in this case the density matrix

In other words, you're computing a posterior estimate for those. Then just substitute "posterior estimate of model parameters" for "posterior estimate of probabilities" in what I posted before. Jaynes explicitly discusses the case of estimating model parameters.

Where Jaynes might differ from you is that, instead of just computing a point estimate for each model parameter, he would compute a probability distribution.

A. Neumaier said:

For this lots of different statistical procedures exist, all variants of the basic technique that I discuss in my paper.

Then how do you choose which one to use?

A. Neumaier said:

The established scientific practice is that the computational technique used is specified together with the results, so that the procedure is objective, i.e., independent of undisclosed knowledge.

Exactly. And Jaynes would agree. But in choosing which computational technique to use, either you've made a subjective choice, or you've made use of some other objective process for making the choice--in which case Jaynes would just include that objective process in his overall analysis. Jaynes would not introduce any additional subjectivity that's not already there in what you're doing.

A. Neumaier said:

Jaynes would have to assume in addition to the data a prior specifying the probability for obtaining a given density matrix

Why? Why would Jaynes have to assume anything that you're not? Either your assumptions plus the data you obtain are sufficient to compute a posterior estimate for what you want (the model parameters), or they're not. If they are, Jaynes would just use them; Jaynes never claims you should make some kind of additional assumption that's not required to compute what you want, just in order to satisfy some preconceived notion of what your process should be. If they're not, then you've left something out.

A. Neumaier said:

Jaynes's model is a probability distribution on states, initially the prior.

It doesn't have to be. It can just as easily be a probability distribution on model parameters. See above.

A. Neumaier said:

My model is the Hilbert space. How can it be considered to be a prior??

Because you've just assumed that Hilbert space is the right model. That's a subjective assumption on your part. Unless you want to justify it based on some kind of argument, in which case the initial assumptions of that argument will be your prior. Sooner or later what you're doing has to bottom out in some subjective choice of initial assumptions.

dextercioby · Nov 3, 2021

What if this subjectivity you (Peter D) invoke is nothing but a logical consequence of trying 100 theoretical models (e.g. Hamilton functions for quartic oscillator) until you find the one which matches experiment? Then you would probably transfer this subjectivity to the human mind who devised the rules of mathematical logic. There is no science done without the human mind, there is no science if you do not attempt to validate a theoretical model, but it should be the goal of science to devise models which can be entrusted, even if there will never be humans or aliens able to test it. Is black hole evaporation by quantum effects science? Will there ever be a human indisputably probing in a man-made and man-financed laboratory the mathematical/physical theory of black-hole evaporation?

A. Neumaier · Nov 3, 2021

PeterDonis said:

It doesn't have to be. It can just as easily be a probability distribution on model parameters.

This is identical. The state (density matrix) is the collection of model parameters.

I never consider probability distributions over states or model parameters. They are overkill. Point estimations (or more complex deterministic estimation procedures) are simpler and generally used.

PeterDonis said:

Because you've just assumed that Hilbert space is the right model. That's a subjective assumption on your part.

In quantum mechanics, assuming a Hilbert space is a must. Otherwise one cannot even begin making predictions. This has nothing to do with Jaynes' priors.

PeterDonis said:

Sooner or later what you're doing has to bottom out in some subjective choice of initial assumptions.

But this is not what Jaynes' theory is about. It is about how to update subjective probability distributions for model parameters when new information arrives.

In contrast, deterministic statistical estimation is concerned with parameter (state) estimation given a fixed collection of data.

PeterDonis said:

Then how do you choose which one to use?

I discuss the limit of arbitrarily much data, in which case the choice does not matter; all asymptotically consistent methods produce the true value. This is the reason one can speak of objectivity. It is the same criterion that is applied in classical physics.

The point of my paper is to show that amount of objectivity in quantum physics is no less than that in classical physics.

Your arguments just imply that classical physics is subjective, according to your standards, since any analysis must make assumptions. But this kind of subjectivity cannot be removed from science. It has nothing to do with the subjectiveness in Jaynes' theory, and is not what scientists mean when they talk of subjectivity of knowledge.

PeterDonis · Nov 3, 2021

A. Neumaier said:

this is not what Jaynes' theory is about.

I think you are mistaken. When I read your description of what you are doing, it looks the same to me as Jaynes's description of what to do in a similar situation. You are just using different terminology, and perhaps making different judgments about what amount of work is necessary (for example, your statement that point estimates of density matrix parameters are sufficient and probability distributions are overkill--though it's quite possible Jaynes would make the same judgment in a similar situation).

A. Neumaier said:

It is about how to update subjective probability distributions for model parameters when new information arrives.

I think your use of "subjective" here is gratuitous and misleading. Probability distributions are not subjective. The only subjectivity is in the initial choice of assumptions, and you state later on in your post (and I agree with you) that assumptions are unavoidable in any area of science. I don't see, as I have already said, that Jaynes would make any assumptions beyond those that you make, in the particular case you discuss. He would just describe the assumptions using different terms.

A. Neumaier said:

In contrast, deterministic statistical estimation is concerned with parameter (state) estimation given a fixed collection of data.

If this statement about "deterministic statistical estimation" is really true, it seems useless to me. What good is a model that can only explain a fixed collection of data and can't be updated when new data comes in?

A. Neumaier said:

Your arguments just imply that classical physics is subjective, according to your standards, since any analysis must make assumptions. But this kind of subjectivity cannot be removed from science and is not what scientists mean when they talk of subjectivity of knowledge.

Then what do scientists mean when they talk about subjectivity of knowledge, and why do you think Jaynes is guilty of it while you are not?

PeterDonis · Nov 3, 2021

A. Neumaier said:

In quantum mechanics, assuming a Hilbert space is a must. Otherwise one cannot even begin making predictions.

Why not?

PeterDonis · Nov 3, 2021

A. Neumaier said:

Jaynes' Bayesian methods (which would describe the uncertainty remaining in terms of a probability distribution on density matrices)

I think you are misunderstanding Jaynes's general method. His general method is not specifically Bayesian; Bayesian inference is a special case of his method (and frequentist inference is in turn, on his view, a special case of Bayesian inference when certain conditions are met). His general method is simply to discover what rules must be followed when making inferences in science if one wants to satisfy certain basic requirements that seem like they would make sense for any scientific inference.

What you are describing is simply what you have found to be the best method for making scientific inference in the special case you describe (inferring a specific density matrix given a Hilbert space and a set of experimental data).

dextercioby · Nov 3, 2021

PeterDonis said:

Why not?

You need a scalar product space (orthogonality of vectors) to account for the probabilistic interpretation and completion to ensure desirable properties for observables (such as convergence of sequences of experimental values).

A. Neumaier · Nov 3, 2021

PeterDonis said:

When I read your description of what you are doing, it looks the same to me as Jaynes's description of what to do in a similar situation.

PeterDonis said:

I think you are misunderstanding Jaynes's general method. His general method is not specifically Bayesian

Please quote from Jaynes and from my paper, so that we have a common ground for comparison. This is better than making fuzzy statements about equivalence of what you think Jaynes is saying.

PeterDonis said:

If your statement about "deterministic statistical estimation" is really true, it seems useless to me. What good is a model that can only explain a fixed collection of data and can't be updated when new data comes in?

The model can discriminate between data that match the model (in which case you get a sensible estimate with which you can make predictions) and data that don't match it (in which case the model assumption is falsified).

When new data comes in one may pool it with the old data to get a bigger set with which to repeat the analysis. No Bayesian (or Jaynesian) machinery is needed for doing this. However, one can use Bayesian thinking to aggregate the old data into a Bayesian prior and then use the new data to calculate a new estimate from this prior and the new data. In important cases (for conjugate priors) this is mathematically equivalent to what frequentist statisticians do under the label of regularization. See, e.g., my paper

A. Neumaier, Solving ill-conditioned and singular linear systems: A tutorial on regularization, SIAM Review 40 (1998), 636-666.

PeterDonis said:

Then what do scientists mean when they talk about subjectivity of knowledge, and why do you think Jaynes is guilty of it while you are not?

Usually they regard engineering practice (i.e., classical mechanics) and engineering level statistics as objective.

The difference is that between objective (frequentist) and subjective (Bayesian) probability. https://en.wikipedia.org/wiki/Bayesian_probability

A. Neumaier said:

In quantum mechanics, assuming a Hilbert space is a must. Otherwise one cannot even begin making predictions.

PeterDonis said:

Why not?

Well, I know how to do predictions with quantum mechanics in a Hilbert space. If you know how to do it without one, please cite a respectable source from which I can learn it.

PeterDonis · Nov 3, 2021

A. Neumaier said:

The difference is that between objective (frequentist) and subjective (Bayesian) probability.

I think Jaynes would have objected to the description of frequentist probability as "objective" and Bayesian as "subjective", since, as I have noted, he considered the former to be a special case of the latter. But that is probably getting too far off topic for this thread. I agree that the process of estimating density matrix parameters from data that you have described is objective (and I think Jaynes would as well).

A. Neumaier said:

Please quote from Jaynes

What I described as Jaynes's general method in post #44 is taken from his Probability Theory: The Logic Of Science, mainly Chapters 1 (towards the end of which he explains the "desiderata" he thinks any rules of reasoning should satisfy) and 2 (where he gives the quantitative rules that those desiderata imply).

The best brief expression of the generality Jaynes claims for the methods in that book is from the Preface (p. xxii, at the bottom):

Jaynes said:

However, neither the Bayesian nor the frequentist approach is universally applicable, so in the present, more general, work we take a broader view of things. Our theme is simply:probability theory as extended logic. The ‘new’ perception amounts to the recognition that the mathematical rules of probability theory are not merely rules for calculating frequencies of ‘random variables’; they are also the unique consistent rules for conducting inference (i.e. plausible reasoning) of any kind, and we shall apply them in full generality to that end.

PeterDonis · Nov 3, 2021

dextercioby said:

You need a scalar product space (orthogonality of vectors) to account for the probabilistic interpretation and completion to ensure desirable properties for observables (such as convergence of sequences of experimental values).

Yes, this is the sort of argument I was talking about. And in Jaynes's terminology, this means you are using Hilbert space as a prior because you have prior information about the kind of phenomena you are modeling, that tells you that you need to use Hilbert space.

This illustrates, btw, that the term "subjective" can be misleading even when referring to the choice of prior (although that term is often used, and I have used it myself), since the considerations that lead to a particular choice of prior can be perfectly objective.

A. Neumaier · Nov 4, 2021

PeterDonis said:

What I described as Jaynes's general method in post #44 is taken from his Probability Theory: The Logic Of Science, mainly Chapters 1 (towards the end of which he explains the "desiderata" he thinks any rules of reasoning should satisfy) and 2 (where he gives the quantitative rules that those desiderata imply).

OK; this explains our misunderstandings. When I referred to Jaynes I meant his paper

Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical review, 106(4), 620.

where he introduced the notions of knowledge and subjective probability to physics. From his abstract:

Edwin Jaynes said:

Information theory provides a constructive criterion for setting up probability distributions on the basis of partial knowledge. [...] In the resulting "subjective statistical mechanics," the usual rules [...] represent the best estimates that could have been made on the basis of the information available.

Thus the assignment subjective to the Bayesian view is Jaynes', not mine!

PeterDonis said:

And in Jaynes's terminology, this means you are using Hilbert space as a prior

This is not how the word prior was used in Jaynes' paper just mentioned, where the usage ageees with the standard usage today in a probabilistic context. Today's usage is given by https://en.wikipedia.org/wiki/Prior_probability
Thus I don't care about the terminology in Jaynes' book. The point of my paper is not a general philosophy of reasoning as in Jaynes' general considerations.
The point of my paper is a proper conceptual foundation of quantum physics with the same characteristic features as classical physics - except that the density operator takes the place of the phase space coordinates.

In my paper I said said:

When a source is stationary, response rates and probabilities can be measured in principle with arbitrary accuracy, in a reproducible way. Thus they are operationally quantifiable, independent of an observer. This makes them objective properties, in the same sense as in classical mechanics, positions and momenta are objective properties. [...]
Everything can be determined and checked completely independent of any subjective knowledge. Nothing subjective remains: Assuming that a quantum system is in a state different from the true state simply leads to wrong predictions that can be falsified by sufficiently long sequences of measurements. Nothing depends on the knowledge of an observer. The latter can be close to the objective truth or far away – depending on how well informed the observer is.
The assignment of states to stationary sources is as objective as any assignment of properties to macroscopic objects. Thus the knowledge people talk about when referring to the meaning of a quantum state resides in what is encoded in (and hence ”known to”) the model used to describe a quantum system – not to any subjective mind content of a knower!
In particular, as quantum values of members of a quantum measure, all probabilities are objective frequentist probabilities in the sense employed everywhere in experimental physics – classical and quantum. That the probabilities are only approximately given by the relative frequencies simply says that – like all measurements – probability measurements are of limited accuracy only.

PeterDonis said:

This illustrates, btw, that the term "subjective" can be misleading even when referring to the choice of prior (although that term is often used, and I have used it myself), since the considerations that lead to a particular choice of prior can be perfectly objective.

As you used it (and the term 'prior'), it is very misleading!

In mainstream physics, one considers the theoretical framework as given, irrespective of what, in his book, Jaynes calls a prior. This includes the model assumptions - typically the phase space in classical physics, the Hilbert space in quantum physics, the causal rules (Galilean in nonrelativistic physics, Minkowski in special relativity, local Minkowski in general relativity), and the parameterized Hamiltonian in conservative mechanics, the equation of motion in dissipative mechanics.

This plays the same role as axioms in mathematics in theorems - it is just a choice of subject matter. There is nothing subjective about this since all choices are made explicit.

A. Neumaier · Nov 4, 2021

PeterDonis said:

What I described as Jaynes's general method in post #44 is taken from his Probability Theory: The Logic Of Science, mainly Chapters 1 (towards the end of which he explains the "desiderata" he thinks any rules of reasoning should satisfy)

My rules of reasoning are those of classical logic, universally applied in mathematics and physics, including probability theory and quantum physics.

Where does Jaynes define the prior in the general sense you claimed? Please give page numbers. ( I have the 2003 edition.)

How does my assumption that the model is given by a Hilbert space and the parameters by a density matrix (which you call a prior) fit Jaynes' desiderata on p.17?

He assumes degrees of plausibilities, but these do not occur in my model assumptions, unless you take the degree to be 100%.

In the main text, the term 'prior information' appears informally on p.6, and semiformally on p.26, where he discusses change of prior information. But my model assumptions never change, hence these rules do not apply. The formal introduction of priors comes only in Chapter 4 (p.119), and then means prior probability distribution in the subjective Bayesian sense as a state of mind of the robot, not in the objective sense of a property of Nature.

PeterDonis · Nov 4, 2021

A. Neumaier said:

When I referred to Jaynes I meant his paper

Ah, ok. This paper is much earlier than the book I referred to, so it's quite possible that Jaynes's own views changed in between.

In general, as I've said, I agree that the process you're describing is objective, so I don't think there is an issue there for this discussion.

A. Neumaier said:

Where does Jaynes define the prior in the general sense you claimed? Please give page numbers.

From pp. 87-88 in my edition:

Jaynes said:

##X## denotes simply whatever additional information the robot has beyond what we have chosen to call ‘the data’

In other words, Jaynes is using "prior" to denote all relevant information other than the "data", which in your example is the data collected by tomography. So Jaynes would include things like the background physical theory you are using in the prior. It is certainly nothing so limited as just an assumed initial probability distribution over model parameters; it also includes all the reasons why you are using a Hilbert space/density matrix model in the first place. The latter information still plays a role in the calculation since it determines the general formulas that are used.

A. Neumaier said:

How does my assumption that the model is given by a Hilbert space and the parameters by a density matrix (which you call a prior) fit Jaynes' desiderata on p.17?

(IIIb) on p. 19: "The robot always takes into account all of the evidence it has relevant to a question." The fact that the model is given by a Hilbert space and the parameters by a density matrix is a consequence of evidence--all the evidence that establishes that those things are the best way to model quantum systems. So using a Hilbert space model with density matrix parameters is necessary in order to take into account all that evidence.

PeterDonis · Nov 4, 2021

A. Neumaier said:

The formal introduction of priors comes only in Chapter 4 (p.119), and then means prior probability distribution in the subjective Bayesian sense as a state of mind of the robot, not in the objective sense of a property of Nature.

In the example you have been describing, you are the robot. The Hilbert space and density matrix parameters are not "properties of Nature". They are states of your mind, and of the minds of all the other scientists that are using your model. Your estimates of the density matrix parameters are the robot's posteriors. If you are thinking of them as "properties of Nature", Jaynes would say you are committing the mind projection fallacy. Your model is not the same as the thing being modeled.

Fra · Nov 4, 2021

PeterDonis said:

Ah, ok. This paper is much earlier than the book I referred to, so it's quite possible that Jaynes's own views changed in between.

PeterDonis said:

So Jaynes would include things like the background physical theory you are using in the prior. It is certainly nothing so limited as just an assumed initial probability distribution over model parameters; it also includes all the reasons why you are using a Hilbert space/density matrix model in the first place. The latter information still plays a role in the calculation since it determines the general formulas that are used.

Jaynes writes on those same pages (p87) in his book also

"But we caution that the term prior is another of those terms from the distant past that can be inappropriate and misleading today"

If we replace the word robot by agent, Jaynes distinction makes good sense, and I use a similar distinction in thinking about "agents". The distinction is what I think of as the difference betwe the agents microstate, and it's microstructure. The state is defined, RELATIVE to the structure. Ie. state vs statespace. In the big inference picture BOTH the state and the SPACE of stats are bound to be updated, but at different time scales. One can also consider the context of general inference and learning that the STRUCTURE is itself merely a "state" in some bigger space. Except that it does not work to parameterized the infinity of future possibiligies. It leads immediately to fine tuning problems. This argument is made also by Lee Smolin in his talks and writings on evolution of law. IMO, the same argument is of relevant in a general learning. This is what distinguishes "optimal data fitting" via some from more intelligent learning. From the perspective to the agent, the evoltuion of the structure has similarities to various dualities where one can transform the dependent variables and get different dynamics. In such a picture it seems reasonable to exepect the hilbert structure as well to be explained, just like the superficial bayesian update of probability, given a FIXED probability space.

I agree it's clear that Jayne includes these genereal background structure, also in the generalized notion of prior information. One could perhaps discuss here if that is "information" vs knowledge or how one should label it, but in the big learning perspective above, the difference should be clear, no matter how we label it.

/Fredrik

A. Neumaier · Nov 4, 2021

PeterDonis said:

From pp. 87-88 in my edition:

Jaynes said said:

X denotes simply whatever additional information the robot has beyond what we have chosen to call ‘the data’

In other words, Jaynes is using "prior" to denote all relevant information other than the "data", which in your example is the data collected by tomography.

No. You are conflating the notions 'prior information' and 'prior' that Jaynes keeps carefully separate:
On p.88, Jaynes distinguishes several distinct items:

Jaynes said said:

Those who are actively familiar with the use of prior probabilities in current real problems usually abbreviate further, and instead of saying ‘the prior probability’ or ‘the prior probability distribution’, they say simply, ‘the prior’. [...] Let us now use the notation
X = prior information,
H = some hypothesis to be tested,
D = the data,

On p.89, he writes:

Jaynes said said:

we need not only the sampling probability
P(D|H X) but also the prior probabilities for D and H:
$$P(H|DX) = P(H|X)\frac{P(D|H X)}{P(D|X) }. ~~~~~~~~~~~~~(4.3)$$
[...] The left-hand side of (4.3), P(H|DX), is generally called a ‘posterior probability’

On pp.108-109, he discusses the dependence on parameters:

Jaynes said said:

In the problem we are discussing, f is simply an unknown constant parameter. [...] There is a prior pdf [...] Then the posterior pdf for f is given by [...]

Thus:

X, called the prior information, is assumed to be fixed, and contains the model assumptions which specify the model and how the parameters enter the model.
H, called the hypothesis, is a question (Boolean function H(f) of the parameters f) to be answered by the analysis.
D, called the data, is experimental information.
P(H|X), called the prior, is the prior probability of H relative to X. Its dependence on the parameters f (discussed later on p.108) is the prior probability distribution for f.
P(H|DX) is the posterior probability of H relative to X, assuming the data D. Its dependence on the parameters f (discussed later on p.108) is the posterior probability distribution for f.

Thus the model assumptions constitute the prior information, and are quite distinct from both the prior (for a parameter-independent hypothesis) and the prior probability distribution, which encodes a subjective assessment of the likelihood of particular value of the parameters. The prior information never figures in the Bayesian probability calculus since it never changes; it only figures in the notation. Indeed, in practice it is suppressed, simplifying the typography of the formulas. Indeed, the latter is already how Jaynes treated the matter in his famous paper.

PeterDonis said:

(IIIb) on p. 19: "The robot always takes into account all of the evidence it has relevant to a question." The fact that the model is given by a Hilbert space and the parameters by a density matrix is a consequence of evidence--all the evidence that establishes that those things are the best way to model quantum systems. So using a Hilbert space model with density matrix parameters is necessary in order to take into account all that evidence.

The robot takes account of the Hilbert space as part of its unchangeable prior information X, not as part of its subjective prior probabilities. The unchangeable part is objective if specified explicitly, since everyone competent will arrive from such a specified X at the same results (in a deterministic calculation from the data) while the Bayesian probabilistic assessment is subjective and remains subjective during all computations. (Apart from being overkill in most applications.)

PeterDonis · Nov 4, 2021

A. Neumaier said:

You are conflating the notions 'prior information' and 'prior' that Jaynes keeps carefully separate

Whenever I have used the term "prior" in this discussion, I have meant "prior information". I apologize for the imprecise use of terminology.

A. Neumaier said:

the Bayesian probabilistic assessment is subjective

Perhaps we are having trouble because of an ambiguity in the word "subjective". If we are going to describe Bayesian probabilities as "subjective", the term can only mean "dependent on the specific information that the robot has". Different robots with different information can compute different Bayesian probabilities.

However, the word "subjective" in common usage has an additional connotation of arbitrariness which is not at all implied or intended in Jaynes's usage. As Jaynes describes it, the process of computing probabilities from a given set of data is perfectly objective; there is no arbitrariness about it at all. There is only one right way to do it. So there is no subjectivity in the sense of arbitrariness in such computations.

The only difference I can see in your own treatment vs. that of Jaynes is that you have said that computing probability distributions is "overkill" and you only need point estimates. And I have already commented that, in a particular case, Jaynes might well agree with such a judgment, since it is a judgment about the benefits vs. the costs of doing additional computations. Ironically, such judgments are the only things we have discussed in this entire thread that are "subjective" in the sense of common usage--that they are personal choices that have an element of arbitrariness to them.

PeterDonis · Nov 4, 2021

A. Neumaier said:

The unchangeable part is objective if specified explicitly, since everyone competent will arrive from such a specified X at the same results (in a deterministic calculation from the data) while the Bayesian probabilistic assessment is subjective and remains subjective during all computations.

In the particular case you describe, since you have declared by fiat that all "robots" involved (all of the scientists assessing some particular instance of quantum tomography) have all of the same prior information and all of the same data, their Bayesian probabilities will obviously all be the same, since you have removed all possible reasons for them to vary.

Remember that my original post in this subthread, post #33, was to object to a claim (made by @gentzen, not you) that your prescription is "opposite" to what Jaynes would say. My point was simply that, in this particular case, Jaynes would say exactly what you are saying. Even the "subjective" element in probabilities--that different "robots" might have different information--is removed in your example. So what you are describing is in fact perfectly consistent with the general method Jaynes describes. It's just a sort of degenerate case of it, since all of the uncertainty involved has been removed--you know exactly what the right model is and exactly what the data is. So everything relevant is exactly known, and it should be no surprise that everyone agrees on it.

A. Neumaier · Nov 4, 2021

PeterDonis said:

If we are going to describe Bayesian probabilities as "subjective", the term can only mean "dependent on the specific information that the robot has".

No. It means that the robot assesses the same data in a robot-specific way, not deducible from objective rules. Whether this way is due to information or to the prior distribution or to goals or to hopes or fears or to whims is secondary.

PeterDonis said:

As Jaynes describes it, the process of computing probabilities from a given set of data is perfectly objective; there is no arbitrariness about it at all. There is only one right way to do it. So there is no subjectivity in the sense of arbitrariness in such computations.

No. The arbitrariness is in the prior, not in the subsequent computations. Moreover, he assumes an ideal robot that functions on the basis of his rational rules; but a real robot cannot do this since the computations would be far too complex.

PeterDonis said:

In the particular case you describe, since you have declared by fiat that all "robots" involved (all of the scientists assessing some particular instance of quantum tomography) have all of the same prior information and all of the same data, their Bayesian probabilities will obviously all be the same, since you have removed all possible reasons for them to vary.

No. They have the same prior information about the physics, but differ in the prior probability assessment (which is the subjective part) and in the degree to which they are faithful to Jaynes' rational rules for manipulating th prior to get the posterior. Indeed scientists are not robots in Jaynes' sense but have goals and preferences that depend not on the data but affect the way they draw conclusions.

PeterDonis · Nov 4, 2021

A. Neumaier said:

No. It means that the robot assesses the same data in a robot-specific way, not deducible from objective rules

I'm sorry, but I simply don't see Jaynes saying this anywhere. His whole book is about figuring out objective rules for the robot to follow for a given problem. He never talks about different robots using different rules for the same problem; he clearly believes that for any given problem, there is one correct set of rules, and that's the set he's looking for.

A. Neumaier said:

The arbitrariness is in the prior

Jaynes spends considerable time discussing the correct ways to assign priors in various situations, so I'm not sure I agree that it is arbitrary. Of course in many real situations the information is far less amenable to being captured in a precise mathematical formulation than it is in the carefully circumscribed physics problem you describe.

A. Neumaier said:

They have the same prior information about the physics, but differ in the prior probability assessment (which is the subjective part)

I don't see how two scientists that are both using the exact same Hilbert space for a given quantum tomography experiment could differ in their computation of ##P(H|X)## for any ##H##.

A. Neumaier said:

in the degree to which they are faithful to Jaynes' rational rules

Of course no real human agent is ever exactly faithful to any set of rules. But you appear to be ruling that out when you talk about the estimates of density matrix parameters from the data being objective in the sense of all scientists involved agreeing on them. That agreement will only happen if they all follow the same rules in doing their computations.

A. Neumaier said:

Indeed scientists are not robots in Jaynes' sense but have goals and preferences that depend not on the data but affect the way they draw conclusions.

If such goals and preferences really do affect the way conclusions are drawn, Jaynes would say (and I would agree) that they should be captured somewhere in the process of doing the computations. If that cannot be done, I would say that the domain under discussion is not (or not yet) a science, because it is not well understood enough. If a physicist were to tell you he doesn't agree with your density matrix parameter estimates from quantum tomography data, you would expect him to give some cogent physics reason like he thinks you're using the wrong Hilbert space for the system. You wouldn't expect him to say it's because he's of a different political party than you, or some other irrelevant factor. But in many domains, things like political beliefs and ideologies certainly do affect the conclusions people come to from a given set of data. We recognize that by not calling those domains sciences.

gentzen · Nov 4, 2021

PeterDonis said:

In fact, what it is describing is the same kind of thing as what Jaynes describes: the "robot" Jaynes describes builds a model of some system, and uses the model to compute probabilities. Those computations are perfectly objective: they are mathematical operations starting from precisely defined initial propositions, and the same operations applied to the same propositions will give the same answers every time.

The only "subjectivity" involved in Jaynes is that different robots in different states of knowledge--meaning, with different sets of data available to them--will have different models, and will therefore make different computations of probabilities because they are starting from different initial propositions. But that is equally true of experimenters doing quantum tomography: their model is built from the information they have obtained from their experiments, and two experimenters who have run different sets of experiments will have different models, and will therefore compute different probabilities. That is every bit as "subjective" as what Jaynes describes. But of course it's not "subjective" at all in the sense of people just arbitrarily choosing probabilities instead of computing them using specified operations from specified initial propositions--and neither is Jaynes.

Sorry for not answering earlier. Writing about Jaynes is tricky for me, because it triggers so many different thoughts. I remembered that I had an email conversation with Kevin van Horn about him, after I commented on https://bayesium.com/probability-theory-does-not-extend-logic/. Here is an extract of the relevant parts:

Sorry for the extremely long delay before answering. ... Jaynes book definitely had some influence on me, even so I mostly disagreed with what he wrote. I am neither Bayesian nor frequentist, instead of an interpretation, I do believe that game theory and probability theory are closely related (https://blog.computationalcomplexit...showComment=1505472807405#c870512924971687938). ...

You ask why I conclude from the interpretation of classical logic as the logic of subsets of a given set that the restriction to a *single* number is basically a bad idea.

My reasoning is simply that even classical logic is not exclusively concerned with a *single* number from {0,1}, but includes the case where we have multiple such numbers. For example, I sometimes use 4 numbers for a proposition: ("actual fact", "judge/state/government version of fact", "opinion of people around me on fact", "my own opinion on fact"). The number for "actual fact" is not always the most relevant, even so it might be the only one of those 4 numbers some people would consider relevant for a logic of plausible reasoning. If "my own opinion on fact" would be the average of the other three numbers, then it would not obey the product rule of probability theory, even if the other three numbers would individually obey the rules of probability theory. (I might try to fix this by using different weights for different contexts. Those weights would then be the relevance of the different versions of "fact" for my own opinion.)

Your recent paper avoids this issue, because it does not assign probabilities to individual propositions, but focuses on the derivability relation X |= A instead. This is good, because that one is really just satisfied or not satisfied, even in predicate logic and non-classical logic. Some non-classical logic might work with sequents (X, Y, ... |= A, B, ...) instead, but even such a sequent is just satisfied or not satisfied.

... For your theorem, you have to explicitly write down all your background knowledge as a propositional formula, and then get the probability for a given proposition (given your background knowledge) as a result. But for the way Cox's theorem is typically used, you can somehow magically encode your background knowledge into a prior (which is a sort of not necessarily normalisable probability distribution), add some observed facts, and then get the probability for a given proposition (given your prior and your observations) as a result.

Of course, this is a caricature version of the Bayesian interpretation, but people do use it that way. And they use it with the intention to convince other people. So what strikes me as misguided is not when people like Scott Aaronson use Bayesian arguments in addition to more conventional arguments to convince other people, but when they replace perfectly fine arguments by a supposedly superior Bayesian argument and exclaim: "This post supersedes my 2006 post on the same topic, which I hereby retire." For me, this is related to the philosophy of Cox's theorem that a single number is preferable over multiple independent numbers (https://philosophy.stackexchange.co...an-reasoning-related-to-the-scientific-method). On the other hand, when Jaynes explains how to obtain (improper) priors for certain situations (https://bayes.wustl.edu/etj/articles/prior.pdf), I do get deeply impressed and include it in my "day to day" reasoning strategies.

The passage from that paper that most influenced my "day to day" reasoning was:

For example, in a chemical laboratory we find a jar containing an unknown and unlabeled compound. We are at first completely ignorant as to whether a small sample of this compound will dissolve in water or not. But having observed that one small sample does dissolve, we infer immediately that all samples of this compound are water soluble, and although this conclusion does not carry quite the force of deductive proof, we feel strongly that the inference was justified. Yet the Bayes-Laplace rule leads to a negligible small probability of this being true, and yields only a probability of 2/3 that the next sample tested will dissolve.

This theme that there can be situations where a single measurement is already very convincing also reappeared in A. Neumaier's thermal interpretation.

I read Jaynes' book back in 2000, but didn't come very far. I guess I stopped in the 3rd chapter. Somehow I got the impression that I wouldn't get those Bayesian insights from it that I had hoped for. The best place to get those insights in a compressed form I have found so far was: https://windowsontheory.org/2021/04/02/inference-and-statistical-physics/. I did read some of Jaynes' papers, and those were a totally different experience for me: always very succinct and rewarding.

PeterDonis · Nov 4, 2021

gentzen said:

I read Jaynes book back in 2000, but didn't come very far.

An unfortunate thing about the book is that it was not finished when Jaynes died. I suspect that if he had lived long enough to finish it, it would be tighter and more like his papers than it is.

PeterDonis · Nov 4, 2021

gentzen said:

that paper

I note, btw, that the paper you reference (the one titled "Prior Probabilities") has as its explicit purpose to remove "arbitrariness" in assigning prior probabilities.

Bayesian statistics in science

Graduate Sidney Coleman's opinion on interpretation in his Dirac lecture

Is the quantum wave function a real object or a mathematical tool?

A question about quantum entanglement

Graduate How valid is the indivisible interpretation of quantum mechanics?

Undergrad "The wavefunction never collapses"

Undergrad A Dataset & Signal Analysis Interpretation of Quantum Mechanics

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Bayesian statistics in science

Similar threads