# Bayesian statistics in science

• I
Sunil
[Moderator's note: This thread has been split off from a previous thread since its topic is best addressed in a separate discussion. This post has been edited to focus on the topic for separate discussion.]

Jaynes has used in the derivation of the rules of probability as the logic of plausible reasoning in his "Probability Theory: The Logic of Science" the following trick: Instead of defining the rules for your own thinking, he has introduced a robot, some AI, and we have the job for defining the rules of his thinking. The trick is that if we think about a rules for a robot, we will care much more about the consistency of these rules. And the basic assumptions there are consistency rules: If there are several ways to derive something, the result should be the same. For our own reasoning, consistency is (intuitively) secondary.

The same type of reasoning we should apply here too. What should be the rules of physical reasoning for a robot designed to help physicists?

Last edited by a moderator:
• physika

Fra
If there are several ways to derive something, the result should be the same. For our own reasoning, consistency is (intuitively) secondary.

The same type of reasoning we should apply here too. What should be the rules of physical reasoning for a robot designed to help physicists?
Or an agent? Then ask what is the problem and consequence of agents making incompatible inferences? Then we are soon friends 😉

/Fredrik

Gold Member
Jaynes has used ... the following trick: ... The trick is that if we think about a rules for a robot, we will care much more about the consistency of these rules. And the basic assumptions there are consistency rules: If there are several ways to derive something, the result should be the same. For our own reasoning, consistency is (intuitively) secondary.

The same type of reasoning we should apply here too. What should be the rules of physical reasoning for a robot designed to help physicists?
Are you aware of the content of the paper "Quantum mechanics via quantum tomography" that this thread is about? For example, it says in section "5.5 Objectivity":
The assignment of states to stationary sources is as objective as any assignment of properties to macroscopic objects. Thus the knowledge people talk about when referring to the meaning of a quantum state resides in what is encoded in (and hence ”known to”) the model used to describe a quantum system – not to any subjective mind content of a knower!

In particular, as quantum values of members of a quantum measure, all probabilities are objective frequentist probabilities in the sense employed everywhere in experimental physics – classical and quantum. That the probabilities are only approximately given by the relative frequencies simply says that – like all measurements – probability measurements are of limited accuracy only.
No robot, no agent, no "subjective mind content of a knower". The meaning of a quantum state resides in what is encoded in (and hence ”known to”) the model! This is almost exactly the opposite of what Jaynes would tell you.

• vanhees71
Mentor
The meaning of a quantum state resides in what is encoded in (and hence ”known to”) the model!
But isn't the model mind content?

This is almost exactly the opposite of what Jaynes would tell you.
I think the passage you quote, or at least the interpretation you are giving it, is trading on an ambiguity in the word "subjective".

In fact, what it is describing is the same kind of thing as what Jaynes describes: the "robot" Jaynes describes builds a model of some system, and uses the model to compute probabilities. Those computations are perfectly objective: they are mathematical operations starting from precisely defined initial propositions, and the same operations applied to the same propositions will give the same answers every time.

The only "subjectivity" involved in Jaynes is that different robots in different states of knowledge--meaning, with different sets of data available to them--will have different models, and will therefore make different computations of probabilities because they are starting from different initial propositions. But that is equally true of experimenters doing quantum tomography: their model is built from the information they have obtained from their experiments, and two experimenters who have run different sets of experiments will have different models, and will therefore compute different probabilities. That is every bit as "subjective" as what Jaynes describes. But of course it's not "subjective" at all in the sense of people just arbitrarily choosing probabilities instead of computing them using specified operations from specified initial propositions--and neither is Jaynes.

• *now*, valenumr, marcusl and 1 other person
But isn't the model mind content?
If you call this 'mind content' then all physics and all language is mind content, and the phrase 'mind content' becomes meaningless since it comprises everything.
I think the passage you quote, or at least the interpretation you are giving it, is trading on an ambiguity in the word "subjective".
gentzen's interpretation is exactly what I intended to convey.
In fact, what it is describing is the same kind of thing as what Jaynes describes: the "robot" Jaynes describes builds a model of some system, and uses the model to compute probabilities.
This is not the standard meaning of 'model'. A model is a template in which the parameters are not fixed but to be determined by experiment. In the case of quantum tomography, the model is the Hilbert space chosen to model the quantum system - nothing else. The state is a matrix of parameters that are not determined by the model but by experiments, using the traditional objective, universally agreed statistical techniques. (Unless you think that the publications of the particle data group are not objective but subjective mind content. Then we do not need to discuss further.)

For example, a classical quartic oscillator is a model defined by a Hamiltonian $$H=p^2/2m+kq^2/2 +gq^2/4;$$ its coefficients (mass ##m## and coupling constant ##g##) are the parameters. The claim that a particular oscillator is well described by this within a given accuracy can be decided objectively by making experiments on the oscillator and comparing it with the predictions of the model.

If the model is correct to some accuracy, there will be a parameter setting that matches the prediction, and in the limit of arbitrarily many and arbitrarily accurate measurements, the parameters will be determined uniquely by the experiment. The only subjectivity is the choice of the ansatz for the Hamiltonian. This is the kind of subjectivity you have everywhere in physics. It has nothing to do with probabilities.

Whereas what you declare to be Jaynes' model is the parameters. The correctness of a Jaynes' model can be refuted by experiment unless the model is actually correct within the given accuracy. This can be established with a 5 sigma confidence if enough data are collected. In physics, this counts as objective.
The only "subjectivity" involved in Jaynes is that different robots in different states of knowledge--meaning, with different sets of data available to them--will have different models, and will therefore make different computations of probabilities because they are starting from different initial propositions.
This makes Jaynes approach subjective in a way quantum tomography is not.

In quantum tomography, the state can be determined objectively independent of initial assumptions, by measuring long enough. There is no subjectivity in the parameters, if your parameters do not agree with the true parameters you'll sooner or later get a statistically arbitrarily significant discrepancy with experiments. Again, the only subjectivity is the choice of the model - in this case Hilbert space representing the quantum system. This is the kind of subjectivity you have everywhere in physics. It has nothing to do with probabilities.

• dextercioby and gentzen
Mentor
This is not the standard meaning of 'model'.
Sure it is. You're just using a different term for it than Jaynes normally uses. See below.

This makes Jaynes approach subjective in a way quantum tomography is not.
No, it means that in your approach, you have fixed the prior (in Bayesian terms):

In the case of quantum tomography, the model is the Hilbert space chosen to model the quantum system - nothing else.
Jaynes would agree with you that, having fixed this prior, any given set of experimental data will objectively lead to a unique computation of probabilities. The only possible difference between different people in this case is that they have different posterior data, in which case they might compute different posterior probabilities. But that is not a difference in models (for your definition of "model"); it's a difference in data.

In other words, what you mean by "model" is what Jaynes means by "prior".

Whereas what you declare to be Jaynes' model is the parameters.
I might not have been clear in my previous post because of the difference in your terminology vs. Jaynes'. Hopefully the above helps to clarify. I don't see any fundamental difference in your approach vs. Jaynes' approach, given that you have fixed the prior.

A question Jaynes might ask is why you have chosen that particular prior; the choice of prior is where the subjectivity enters in in Jaynes' view, but even on that view, one should still have some reasonable ground for choosing a particular prior. Given the role that Hilbert spaces are already understood to play in QM that question should not be hard for you to answer. (Although you, as the author of the thermal interpretation, might also want to explain why you chose the Hilbert space instead of the set of expectation values.)

Jaynes would agree with you that, having fixed this prior, any given set of experimental data will objectively lead to a unique computation of probabilities.
No - neither he nor I would claim that. A given set of experimental data will never lead to a unique computation of probabilities. Different statistical techniques will give different results:
• A simple frequentist estimator would be the relative frequency, which is not the probability but a deterministic (uniquely determined) estimate for it.
• Jaynes would have to assume in addition to the data a prior for the probabilities (for example a Dirichlet prior) and then compute from data and prior combined a unique posterior for the probabilities. Because it depends on the prior the result is subjective.
But in quantum tomography the goal is not to obtain probabilities but to obtain the parameters of the model, in this case the density matrix. For this lots of different statistical procedures exist, all variants of the basic technique that I discuss in my paper. They produce different results - some more accurate than others, and increasing accuracy given the same data and a limited computational budget counts as scientific progress. The established scientific practice is that the computational technique used is specified together with the results, so that the procedure is objective, i.e., independent of undisclosed knowledge.

Jaynes would have to assume in addition to the data a prior specifying the probability for obtaining a givn density matrix, and then update this prior in the light of the data. This is a very inefficient way to proceed, especially when the Hilbert space is not only of toy dimensions. For a 10 qubit system, the Hilbert space has dimension 1024, the density matrix depends on more than a million variables, and the posterior would be an extremely complicated probability distribution in dimension of more than a million. Huge overkill!
The only possible difference between different people in this case is that they have different posterior data, in which case they might compute different posterior probabilities. But that is not a difference in models (for your definition of "model"); it's a difference in data.

In other words, what you mean by "model" is what Jaynes means by "prior".
?
Jaynes's model is a probability distribution on states, initially the prior.

My model is the Hilbert space. How can it be considered to be a prior??
given that you have fixed the prior.
? I don't have a prior. I have a model (a Hilbert space) and a huge, arbitrarily extensible collection of data. The latter determines (prior-independent) the parameters that are unspecified in the model to an accuracy determined by the data.

A question Jaynes might ask is why you have chosen that particular prior;
Which prior did I choose? When and where?
explain why you chose the Hilbert space instead of the set of expectation values.
The Hilbert space may be, for example, the tensor product of two two-dimensional Hilbert spaces. This Hilbert space enables one to discuss beams of two entangled photons in Bell experiments. This is the model. It models all possible beams of two entangled photons.

To find out which of this continuum of possibilities is actually realized you need to know in addition to the model its state. This knowledge is obtained by quantum tomography. It is objectively determined to some accuracy by sufficiently extensive data. There are many ways to extract this objective knowledge from the data.

Jaynes' Bayesian methods (which would describe the uncertainty remaining in terms of a probability distribution on density matrices) are not among the most used techniques to do this.

Last edited:
• gentzen
Mentor
in quantum tomography the goal is not to obtain probabilities but to obtain the parameters of the model, in this case the density matrix
In other words, you're computing a posterior estimate for those. Then just substitute "posterior estimate of model parameters" for "posterior estimate of probabilities" in what I posted before. Jaynes explicitly discusses the case of estimating model parameters.

Where Jaynes might differ from you is that, instead of just computing a point estimate for each model parameter, he would compute a probability distribution.

For this lots of different statistical procedures exist, all variants of the basic technique that I discuss in my paper.
Then how do you choose which one to use?

The established scientific practice is that the computational technique used is specified together with the results, so that the procedure is objective, i.e., independent of undisclosed knowledge.
Exactly. And Jaynes would agree. But in choosing which computational technique to use, either you've made a subjective choice, or you've made use of some other objective process for making the choice--in which case Jaynes would just include that objective process in his overall analysis. Jaynes would not introduce any additional subjectivity that's not already there in what you're doing.

Jaynes would have to assume in addition to the data a prior specifying the probability for obtaining a given density matrix
Why? Why would Jaynes have to assume anything that you're not? Either your assumptions plus the data you obtain are sufficient to compute a posterior estimate for what you want (the model parameters), or they're not. If they are, Jaynes would just use them; Jaynes never claims you should make some kind of additional assumption that's not required to compute what you want, just in order to satisfy some preconceived notion of what your process should be. If they're not, then you've left something out.

Jaynes's model is a probability distribution on states, initially the prior.
It doesn't have to be. It can just as easily be a probability distribution on model parameters. See above.

My model is the Hilbert space. How can it be considered to be a prior??
Because you've just assumed that Hilbert space is the right model. That's a subjective assumption on your part. Unless you want to justify it based on some kind of argument, in which case the initial assumptions of that argument will be your prior. Sooner or later what you're doing has to bottom out in some subjective choice of initial assumptions.

• dextercioby
Homework Helper
What if this subjectivity you (Peter D) invoke is nothing but a logical consequence of trying 100 theoretical models (e.g. Hamilton functions for quartic oscillator) until you find the one which matches experiment? Then you would probably transfer this subjectivity to the human mind who devised the rules of mathematical logic. There is no science done without the human mind, there is no science if you do not attempt to validate a theoretical model, but it should be the goal of science to devise models which can be entrusted, even if there will never be humans or aliens able to test it. Is black hole evaporation by quantum effects science? Will there ever be a human indisputably probing in a man-made and man-financed laboratory the mathematical/physical theory of black-hole evaporation?

Last edited:
It doesn't have to be. It can just as easily be a probability distribution on model parameters.
This is identical. The state (density matrix) is the collection of model parameters.

I never consider probability distributions over states or model parameters. They are overkill. Point estimations (or more complex deterministic estimation procedures) are simpler and generally used.
Because you've just assumed that Hilbert space is the right model. That's a subjective assumption on your part.
In quantum mechanics, assuming a Hilbert space is a must. Otherwise one cannot even begin making predictions. This has nothing to do with Jaynes' priors.
Sooner or later what you're doing has to bottom out in some subjective choice of initial assumptions.
But this is not what Jaynes' theory is about. It is about how to update subjective probability distributions for model parameters when new information arrives.

In contrast, deterministic statistical estimation is concerned with parameter (state) estimation given a fixed collection of data.

Then how do you choose which one to use?
I discuss the limit of arbitrarily much data, in which case the choice does not matter; all asymptotically consistent methods produce the true value. This is the reason one can speak of objectivity. It is the same criterion that is applied in classical physics.

The point of my paper is to show that amount of objectivity in quantum physics is no less than that in classical physics.

Your arguments just imply that classical physics is subjective, according to your standards, since any analysis must make assumptions. But this kind of subjectivity cannot be removed from science. It has nothing to do with the subjectiveness in Jaynes' theory, and is not what scientists mean when they talk of subjectivity of knowledge.

Last edited:
Mentor
this is not what Jaynes' theory is about.
I think you are mistaken. When I read your description of what you are doing, it looks the same to me as Jaynes's description of what to do in a similar situation. You are just using different terminology, and perhaps making different judgments about what amount of work is necessary (for example, your statement that point estimates of density matrix parameters are sufficient and probability distributions are overkill--though it's quite possible Jaynes would make the same judgment in a similar situation).

It is about how to update subjective probability distributions for model parameters when new information arrives.
I think your use of "subjective" here is gratuitous and misleading. Probability distributions are not subjective. The only subjectivity is in the initial choice of assumptions, and you state later on in your post (and I agree with you) that assumptions are unavoidable in any area of science. I don't see, as I have already said, that Jaynes would make any assumptions beyond those that you make, in the particular case you discuss. He would just describe the assumptions using different terms.

In contrast, deterministic statistical estimation is concerned with parameter (state) estimation given a fixed collection of data.
If this statement about "deterministic statistical estimation" is really true, it seems useless to me. What good is a model that can only explain a fixed collection of data and can't be updated when new data comes in?

Your arguments just imply that classical physics is subjective, according to your standards, since any analysis must make assumptions. But this kind of subjectivity cannot be removed from science and is not what scientists mean when they talk of subjectivity of knowledge.
Then what do scientists mean when they talk about subjectivity of knowledge, and why do you think Jaynes is guilty of it while you are not?

Last edited:
Mentor
In quantum mechanics, assuming a Hilbert space is a must. Otherwise one cannot even begin making predictions.
Why not?

Mentor
Jaynes' Bayesian methods (which would describe the uncertainty remaining in terms of a probability distribution on density matrices)
I think you are misunderstanding Jaynes's general method. His general method is not specifically Bayesian; Bayesian inference is a special case of his method (and frequentist inference is in turn, on his view, a special case of Bayesian inference when certain conditions are met). His general method is simply to discover what rules must be followed when making inferences in science if one wants to satisfy certain basic requirements that seem like they would make sense for any scientific inference.

What you are describing is simply what you have found to be the best method for making scientific inference in the special case you describe (inferring a specific density matrix given a Hilbert space and a set of experimental data).

Homework Helper
Why not?

You need a scalar product space (orthogonality of vectors) to account for the probabilistic interpretation and completion to ensure desirable properties for observables (such as convergence of sequences of experimental values).

When I read your description of what you are doing, it looks the same to me as Jaynes's description of what to do in a similar situation.
I think you are misunderstanding Jaynes's general method. His general method is not specifically Bayesian
Please quote from Jaynes and from my paper, so that we have a common ground for comparison. This is better than making fuzzy statements about equivalence of what you think Jaynes is saying.
If your statement about "deterministic statistical estimation" is really true, it seems useless to me. What good is a model that can only explain a fixed collection of data and can't be updated when new data comes in?
The model can discriminate between data that match the model (in which case you get a sensible estimate with which you can make predictions) and data that don't match it (in which case the model assumption is falsified).

When new data comes in one may pool it with the old data to get a bigger set with which to repeat the analysis. No Bayesian (or Jaynesian) machinery is needed for doing this. However, one can use Bayesian thinking to aggregate the old data into a Bayesian prior and then use the new data to calculate a new estimate from this prior and the new data. In important cases (for conjugate priors) this is mathematically equivalent to what frequentist statisticians do under the label of regularization. See, e.g., my paper
• A. Neumaier, Solving ill-conditioned and singular linear systems: A tutorial on regularization, SIAM Review 40 (1998), 636-666.
Then what do scientists mean when they talk about subjectivity of knowledge, and why do you think Jaynes is guilty of it while you are not?
Usually they regard engineering practice (i.e., classical mechanics) and engineering level statistics as objective.

The difference is that between objective (frequentist) and subjective (Bayesian) probability. https://en.wikipedia.org/wiki/Bayesian_probability
In quantum mechanics, assuming a Hilbert space is a must. Otherwise one cannot even begin making predictions.
Why not?
Well, I know how to do predictions with quantum mechanics in a Hilbert space. If you know how to do it without one, please cite a respectable source from which I can learn it.

• gentzen and dextercioby
Mentor
The difference is that between objective (frequentist) and subjective (Bayesian) probability.
I think Jaynes would have objected to the description of frequentist probability as "objective" and Bayesian as "subjective", since, as I have noted, he considered the former to be a special case of the latter. But that is probably getting too far off topic for this thread. I agree that the process of estimating density matrix parameters from data that you have described is objective (and I think Jaynes would as well).

What I described as Jaynes's general method in post #44 is taken from his Probability Theory: The Logic Of Science, mainly Chapters 1 (towards the end of which he explains the "desiderata" he thinks any rules of reasoning should satisfy) and 2 (where he gives the quantitative rules that those desiderata imply).

The best brief expression of the generality Jaynes claims for the methods in that book is from the Preface (p. xxii, at the bottom):

Jaynes said:
However, neither the Bayesian nor the frequentist approach is universally applicable, so in the present, more general, work we take a broader view of things. Our theme is simply:probability theory as extended logic. The ‘new’ perception amounts to the recognition that the mathematical rules of probability theory are not merely rules for calculating frequencies of ‘random variables’; they are also the unique consistent rules for conducting inference (i.e. plausible reasoning) of any kind, and we shall apply them in full generality to that end.

Mentor
You need a scalar product space (orthogonality of vectors) to account for the probabilistic interpretation and completion to ensure desirable properties for observables (such as convergence of sequences of experimental values).
Yes, this is the sort of argument I was talking about. And in Jaynes's terminology, this means you are using Hilbert space as a prior because you have prior information about the kind of phenomena you are modeling, that tells you that you need to use Hilbert space.

This illustrates, btw, that the term "subjective" can be misleading even when referring to the choice of prior (although that term is often used, and I have used it myself), since the considerations that lead to a particular choice of prior can be perfectly objective.

What I described as Jaynes's general method in post #44 is taken from his Probability Theory: The Logic Of Science, mainly Chapters 1 (towards the end of which he explains the "desiderata" he thinks any rules of reasoning should satisfy) and 2 (where he gives the quantitative rules that those desiderata imply).
OK; this explains our misunderstandings. When I referred to Jaynes I meant his paper
• Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical review, 106(4), 620.
where he introduced the notions of knowledge and subjective probability to physics. From his abstract:
Edwin Jaynes said:
Information theory provides a constructive criterion for setting up probability distributions on the basis of partial knowledge. [...] In the resulting "subjective statistical mechanics," the usual rules [...] represent the best estimates that could have been made on the basis of the information available.
Thus the assignment subjective to the Bayesian view is Jaynes', not mine!
And in Jaynes's terminology, this means you are using Hilbert space as a prior
This is not how the word prior was used in Jaynes' paper just mentioned, where the usage ageees with the standard usage today in a probabilistic context. Today's usage is given by https://en.wikipedia.org/wiki/Prior_probability
Thus I don't care about the terminology in Jaynes' book. The point of my paper is not a general philosophy of reasoning as in Jaynes' general considerations.
The point of my paper is a proper conceptual foundation of quantum physics with the same characteristic features as classical physics - except that the density operator takes the place of the phase space coordinates.
In my paper I said said:
When a source is stationary, response rates and probabilities can be measured in principle with arbitrary accuracy, in a reproducible way. Thus they are operationally quantifiable, independent of an observer. This makes them objective properties, in the same sense as in classical mechanics, positions and momenta are objective properties. [...]
Everything can be determined and checked completely independent of any subjective knowledge. Nothing subjective remains: Assuming that a quantum system is in a state different from the true state simply leads to wrong predictions that can be falsified by sufficiently long sequences of measurements. Nothing depends on the knowledge of an observer. The latter can be close to the objective truth or far away – depending on how well informed the observer is.
The assignment of states to stationary sources is as objective as any assignment of properties to macroscopic objects. Thus the knowledge people talk about when referring to the meaning of a quantum state resides in what is encoded in (and hence ”known to”) the model used to describe a quantum system – not to any subjective mind content of a knower!
In particular, as quantum values of members of a quantum measure, all probabilities are objective frequentist probabilities in the sense employed everywhere in experimental physics – classical and quantum. That the probabilities are only approximately given by the relative frequencies simply says that – like all measurements – probability measurements are of limited accuracy only.
This illustrates, btw, that the term "subjective" can be misleading even when referring to the choice of prior (although that term is often used, and I have used it myself), since the considerations that lead to a particular choice of prior can be perfectly objective.
As you used it (and the term 'prior'), it is very misleading!

In mainstream physics, one considers the theoretical framework as given, irrespective of what, in his book, Jaynes calls a prior. This includes the model assumptions - typically the phase space in classical physics, the Hilbert space in quantum physics, the causal rules (Galilean in nonrelativistic physics, Minkowski in special relativity, local Minkowski in general relativity), and the parameterized Hamiltonian in conservative mechanics, the equation of motion in dissipative mechanics.

This plays the same role as axioms in mathematics in theorems - it is just a choice of subject matter. There is nothing subjective about this since all choices are made explicit.

Last edited:
• WernerQH and gentzen
What I described as Jaynes's general method in post #44 is taken from his Probability Theory: The Logic Of Science, mainly Chapters 1 (towards the end of which he explains the "desiderata" he thinks any rules of reasoning should satisfy)
My rules of reasoning are those of classical logic, universally applied in mathematics and physics, including probability theory and quantum physics.

Where does Jaynes define the prior in the general sense you claimed? Please give page numbers. ( I have the 2003 edition.)

How does my assumption that the model is given by a Hilbert space and the parameters by a density matrix (which you call a prior) fit Jaynes' desiderata on p.17?

He assumes degrees of plausibilities, but these do not occur in my model assumptions, unless you take the degree to be 100%.

In the main text, the term 'prior information' appears informally on p.6, and semiformally on p.26, where he discusses change of prior information. But my model assumptions never change, hence these rules do not apply. The formal introduction of priors comes only in Chapter 4 (p.119), and then means prior probability distribution in the subjective Bayesian sense as a state of mind of the robot, not in the objective sense of a property of Nature.

Mentor
When I referred to Jaynes I meant his paper
Ah, ok. This paper is much earlier than the book I referred to, so it's quite possible that Jaynes's own views changed in between.

In general, as I've said, I agree that the process you're describing is objective, so I don't think there is an issue there for this discussion.

Where does Jaynes define the prior in the general sense you claimed? Please give page numbers.
From pp. 87-88 in my edition:

Jaynes said:
##X## denotes simply whatever additional information the robot has beyond what we have chosen to call ‘the data’
In other words, Jaynes is using "prior" to denote all relevant information other than the "data", which in your example is the data collected by tomography. So Jaynes would include things like the background physical theory you are using in the prior. It is certainly nothing so limited as just an assumed initial probability distribution over model parameters; it also includes all the reasons why you are using a Hilbert space/density matrix model in the first place. The latter information still plays a role in the calculation since it determines the general formulas that are used.

How does my assumption that the model is given by a Hilbert space and the parameters by a density matrix (which you call a prior) fit Jaynes' desiderata on p.17?
(IIIb) on p. 19: "The robot always takes into account all of the evidence it has relevant to a question." The fact that the model is given by a Hilbert space and the parameters by a density matrix is a consequence of evidence--all the evidence that establishes that those things are the best way to model quantum systems. So using a Hilbert space model with density matrix parameters is necessary in order to take into account all that evidence.

Mentor
The formal introduction of priors comes only in Chapter 4 (p.119), and then means prior probability distribution in the subjective Bayesian sense as a state of mind of the robot, not in the objective sense of a property of Nature.
In the example you have been describing, you are the robot. The Hilbert space and density matrix parameters are not "properties of Nature". They are states of your mind, and of the minds of all the other scientists that are using your model. Your estimates of the density matrix parameters are the robot's posteriors. If you are thinking of them as "properties of Nature", Jaynes would say you are committing the mind projection fallacy. Your model is not the same as the thing being modeled.

• WernerQH
Fra
Ah, ok. This paper is much earlier than the book I referred to, so it's quite possible that Jaynes's own views changed in between.

So Jaynes would include things like the background physical theory you are using in the prior. It is certainly nothing so limited as just an assumed initial probability distribution over model parameters; it also includes all the reasons why you are using a Hilbert space/density matrix model in the first place. The latter information still plays a role in the calculation since it determines the general formulas that are used.

Jaynes writes on those same pages (p87) in his book also

"But we caution that the term prior is another of those terms from the distant past that can be inappropriate and misleading today"

If we replace the word robot by agent, Jaynes distinction makes good sense, and I use a similar distinction in thinking about "agents". The distinction is what I think of as the difference betwe the agents microstate, and it's microstructure. The state is defined, RELATIVE to the structure. Ie. state vs statespace. In the big inference picture BOTH the state and the SPACE of stats are bound to be updated, but at different time scales. One can also consider the context of general inference and learning that the STRUCTURE is itself merely a "state" in some bigger space. Except that it does not work to parameterized the infinity of future possibiligies. It leads immediately to fine tuning problems. This argument is made also by Lee Smolin in his talks and writings on evolution of law. IMO, the same argument is of relevant in a general learning. This is what distinguishes "optimal data fitting" via some from more intelligent learning. From the perspective to the agent, the evoltuion of the structure has similarities to various dualities where one can transform the dependent variables and get different dynamics. In such a picture it seems reasonable to exepect the hilbert structure as well to be explained, just like the superficial bayesian update of probability, given a FIXED probability space.

I agree it's clear that Jayne includes these genereal background structure, also in the generalized notion of prior information. One could perhaps discuss here if that is "information" vs knowledge or how one should label it, but in the big learning perspective above, the difference should be clear, no matter how we label it.

/Fredrik

From pp. 87-88 in my edition:
Jaynes said said:
X denotes simply whatever additional information the robot has beyond what we have chosen to call ‘the data’
In other words, Jaynes is using "prior" to denote all relevant information other than the "data", which in your example is the data collected by tomography.
No. You are conflating the notions 'prior information' and 'prior' that Jaynes keeps carefully separate:
On p.88, Jaynes distinguishes several distinct items:
Jaynes said said:
Those who are actively familiar with the use of prior probabilities in current real problems usually abbreviate further, and instead of saying ‘the prior probability’ or ‘the prior probability distribution’, they say simply, ‘the prior’. [...] Let us now use the notation
X = prior information,
H = some hypothesis to be tested,
D = the data,
On p.89, he writes:
Jaynes said said:
we need not only the sampling probability
P(D|H X) but also the prior probabilities for D and H:
$$P(H|DX) = P(H|X)\frac{P(D|H X)}{P(D|X) }. ~~~~~~~~~~~~~(4.3)$$
[...] The left-hand side of (4.3), P(H|DX), is generally called a ‘posterior probability’
On pp.108-109, he discusses the dependence on parameters:
Jaynes said said:
In the problem we are discussing, f is simply an unknown constant parameter. [...] There is a prior pdf [...] Then the posterior pdf for f is given by [...]
Thus:
• X, called the prior information, is assumed to be fixed, and contains the model assumptions which specify the model and how the parameters enter the model.
• H, called the hypothesis, is a question (Boolean function H(f) of the parameters f) to be answered by the analysis.
• D, called the data, is experimental information.
• P(H|X), called the prior, is the prior probability of H relative to X. Its dependence on the parameters f (discussed later on p.108) is the prior probability distribution for f.
• P(H|DX) is the posterior probability of H relative to X, assuming the data D. Its dependence on the parameters f (discussed later on p.108) is the posterior probability distribution for f.
Thus the model assumptions constitute the prior information, and are quite distinct from both the prior (for a parameter-independent hypothesis) and the prior probability distribution, which encodes a subjective assessment of the likelihood of particular value of the parameters. The prior information never figures in the Bayesian probability calculus since it never changes; it only figures in the notation. Indeed, in practice it is suppressed, simplifying the typography of the formulas. Indeed, the latter is already how Jaynes treated the matter in his famous paper.
(IIIb) on p. 19: "The robot always takes into account all of the evidence it has relevant to a question." The fact that the model is given by a Hilbert space and the parameters by a density matrix is a consequence of evidence--all the evidence that establishes that those things are the best way to model quantum systems. So using a Hilbert space model with density matrix parameters is necessary in order to take into account all that evidence.
The robot takes account of the Hilbert space as part of its unchangeable prior information X, not as part of its subjective prior probabilities. The unchangeable part is objective if specified explicitly, since everyone competent will arrive from such a specified X at the same results (in a deterministic calculation from the data) while the Bayesian probabilistic assessment is subjective and remains subjective during all computations. (Apart from being overkill in most applications.)

• gentzen
Mentor
You are conflating the notions 'prior information' and 'prior' that Jaynes keeps carefully separate
Whenever I have used the term "prior" in this discussion, I have meant "prior information". I apologize for the imprecise use of terminology.

the Bayesian probabilistic assessment is subjective
Perhaps we are having trouble because of an ambiguity in the word "subjective". If we are going to describe Bayesian probabilities as "subjective", the term can only mean "dependent on the specific information that the robot has". Different robots with different information can compute different Bayesian probabilities.

However, the word "subjective" in common usage has an additional connotation of arbitrariness which is not at all implied or intended in Jaynes's usage. As Jaynes describes it, the process of computing probabilities from a given set of data is perfectly objective; there is no arbitrariness about it at all. There is only one right way to do it. So there is no subjectivity in the sense of arbitrariness in such computations.

The only difference I can see in your own treatment vs. that of Jaynes is that you have said that computing probability distributions is "overkill" and you only need point estimates. And I have already commented that, in a particular case, Jaynes might well agree with such a judgment, since it is a judgment about the benefits vs. the costs of doing additional computations. Ironically, such judgments are the only things we have discussed in this entire thread that are "subjective" in the sense of common usage--that they are personal choices that have an element of arbitrariness to them.

Mentor
The unchangeable part is objective if specified explicitly, since everyone competent will arrive from such a specified X at the same results (in a deterministic calculation from the data) while the Bayesian probabilistic assessment is subjective and remains subjective during all computations.
In the particular case you describe, since you have declared by fiat that all "robots" involved (all of the scientists assessing some particular instance of quantum tomography) have all of the same prior information and all of the same data, their Bayesian probabilities will obviously all be the same, since you have removed all possible reasons for them to vary.

Remember that my original post in this subthread, post #33, was to object to a claim (made by @gentzen, not you) that your prescription is "opposite" to what Jaynes would say. My point was simply that, in this particular case, Jaynes would say exactly what you are saying. Even the "subjective" element in probabilities--that different "robots" might have different information--is removed in your example. So what you are describing is in fact perfectly consistent with the general method Jaynes describes. It's just a sort of degenerate case of it, since all of the uncertainty involved has been removed--you know exactly what the right model is and exactly what the data is. So everything relevant is exactly known, and it should be no surprise that everyone agrees on it.

If we are going to describe Bayesian probabilities as "subjective", the term can only mean "dependent on the specific information that the robot has".
No. It means that the robot assesses the same data in a robot-specific way, not deducible from objective rules. Whether this way is due to information or to the prior distribution or to goals or to hopes or fears or to whims is secondary.
As Jaynes describes it, the process of computing probabilities from a given set of data is perfectly objective; there is no arbitrariness about it at all. There is only one right way to do it. So there is no subjectivity in the sense of arbitrariness in such computations.
No. The arbitrariness is in the prior, not in the subsequent computations. Moreover, he assumes an ideal robot that functions on the basis of his rational rules; but a real robot cannot do this since the computations would be far too complex.
In the particular case you describe, since you have declared by fiat that all "robots" involved (all of the scientists assessing some particular instance of quantum tomography) have all of the same prior information and all of the same data, their Bayesian probabilities will obviously all be the same, since you have removed all possible reasons for them to vary.
No. They have the same prior information about the physics, but differ in the prior probability assessment (which is the subjective part) and in the degree to which they are faithful to Jaynes' rational rules for manipulating th prior to get the posterior. Indeed scientists are not robots in Jaynes' sense but have goals and preferences that depend not on the data but affect the way they draw conclusions.

Mentor
No. It means that the robot assesses the same data in a robot-specific way, not deducible from objective rules
I'm sorry, but I simply don't see Jaynes saying this anywhere. His whole book is about figuring out objective rules for the robot to follow for a given problem. He never talks about different robots using different rules for the same problem; he clearly believes that for any given problem, there is one correct set of rules, and that's the set he's looking for.

The arbitrariness is in the prior
Jaynes spends considerable time discussing the correct ways to assign priors in various situations, so I'm not sure I agree that it is arbitrary. Of course in many real situations the information is far less amenable to being captured in a precise mathematical formulation than it is in the carefully circumscribed physics problem you describe.

They have the same prior information about the physics, but differ in the prior probability assessment (which is the subjective part)
I don't see how two scientists that are both using the exact same Hilbert space for a given quantum tomography experiment could differ in their computation of ##P(H|X)## for any ##H##.

in the degree to which they are faithful to Jaynes' rational rules
Of course no real human agent is ever exactly faithful to any set of rules. But you appear to be ruling that out when you talk about the estimates of density matrix parameters from the data being objective in the sense of all scientists involved agreeing on them. That agreement will only happen if they all follow the same rules in doing their computations.

Indeed scientists are not robots in Jaynes' sense but have goals and preferences that depend not on the data but affect the way they draw conclusions.
If such goals and preferences really do affect the way conclusions are drawn, Jaynes would say (and I would agree) that they should be captured somewhere in the process of doing the computations. If that cannot be done, I would say that the domain under discussion is not (or not yet) a science, because it is not well understood enough. If a physicist were to tell you he doesn't agree with your density matrix parameter estimates from quantum tomography data, you would expect him to give some cogent physics reason like he thinks you're using the wrong Hilbert space for the system. You wouldn't expect him to say it's because he's of a different political party than you, or some other irrelevant factor. But in many domains, things like political beliefs and ideologies certainly do affect the conclusions people come to from a given set of data. We recognize that by not calling those domains sciences.

Gold Member
In fact, what it is describing is the same kind of thing as what Jaynes describes: the "robot" Jaynes describes builds a model of some system, and uses the model to compute probabilities. Those computations are perfectly objective: they are mathematical operations starting from precisely defined initial propositions, and the same operations applied to the same propositions will give the same answers every time.

The only "subjectivity" involved in Jaynes is that different robots in different states of knowledge--meaning, with different sets of data available to them--will have different models, and will therefore make different computations of probabilities because they are starting from different initial propositions. But that is equally true of experimenters doing quantum tomography: their model is built from the information they have obtained from their experiments, and two experimenters who have run different sets of experiments will have different models, and will therefore compute different probabilities. That is every bit as "subjective" as what Jaynes describes. But of course it's not "subjective" at all in the sense of people just arbitrarily choosing probabilities instead of computing them using specified operations from specified initial propositions--and neither is Jaynes.
Sorry for not answering earlier. Writing about Jaynes is tricky for me, because it triggers so many different thoughts. I remembered that I had an email conversation with Kevin van Horn about him, after I commented on https://bayesium.com/probability-theory-does-not-extend-logic/. Here is an extract of the relevant parts:
Sorry for the extremely long delay before answering. ... Jaynes book definitely had some influence on me, even so I mostly disagreed with what he wrote. I am neither Bayesian nor frequentist, instead of an interpretation, I do believe that game theory and probability theory are closely related (https://blog.computationalcomplexit...showComment=1505472807405#c870512924971687938). ...

You ask why I conclude from the interpretation of classical logic as the logic of subsets of a given set that the restriction to a *single* number is basically a bad idea.

My reasoning is simply that even classical logic is not exclusively concerned with a *single* number from {0,1}, but includes the case where we have multiple such numbers. For example, I sometimes use 4 numbers for a proposition: ("actual fact", "judge/state/government version of fact", "opinion of people around me on fact", "my own opinion on fact"). The number for "actual fact" is not always the most relevant, even so it might be the only one of those 4 numbers some people would consider relevant for a logic of plausible reasoning. If "my own opinion on fact" would be the average of the other three numbers, then it would not obey the product rule of probability theory, even if the other three numbers would individually obey the rules of probability theory. (I might try to fix this by using different weights for different contexts. Those weights would then be the relevance of the different versions of "fact" for my own opinion.)

Your recent paper avoids this issue, because it does not assign probabilities to individual propositions, but focuses on the derivability relation X |= A instead. This is good, because that one is really just satisfied or not satisfied, even in predicate logic and non-classical logic. Some non-classical logic might work with sequents (X, Y, ... |= A, B, ...) instead, but even such a sequent is just satisfied or not satisfied.

... For your theorem, you have to explicitly write down all your background knowledge as a propositional formula, and then get the probability for a given proposition (given your background knowledge) as a result. But for the way Cox's theorem is typically used, you can somehow magically encode your background knowledge into a prior (which is a sort of not necessarily normalisable probability distribution), add some observed facts, and then get the probability for a given proposition (given your prior and your observations) as a result.

Of course, this is a caricature version of the Bayesian interpretation, but people do use it that way. And they use it with the intention to convince other people. So what strikes me as misguided is not when people like Scott Aaronson use Bayesian arguments in addition to more conventional arguments to convince other people, but when they replace perfectly fine arguments by a supposedly superior Bayesian argument and exclaim: "This post supersedes my 2006 post on the same topic, which I hereby retire." For me, this is related to the philosophy of Cox's theorem that a single number is preferable over multiple independent numbers (https://philosophy.stackexchange.co...an-reasoning-related-to-the-scientific-method). On the other hand, when Jaynes explains how to obtain (improper) priors for certain situations (https://bayes.wustl.edu/etj/articles/prior.pdf), I do get deeply impressed and include it in my "day to day" reasoning strategies.
The passage from that paper that most influenced my "day to day" reasoning was:
For example, in a chemical laboratory we find a jar containing an unknown and unlabeled compound. We are at first completely ignorant as to whether a small sample of this compound will dissolve in water or not. But having observed that one small sample does dissolve, we infer immediately that all samples of this compound are water soluble, and although this conclusion does not carry quite the force of deductive proof, we feel strongly that the inference was justified. Yet the Bayes-Laplace rule leads to a negligible small probability of this being true, and yields only a probability of 2/3 that the next sample tested will dissolve.
This theme that there can be situations where a single measurement is already very convincing also reappeared in A. Neumaier's thermal interpretation.

I read Jaynes' book back in 2000, but didn't come very far. I guess I stopped in the 3rd chapter. Somehow I got the impression that I wouldn't get those Bayesian insights from it that I had hoped for. The best place to get those insights in a compressed form I have found so far was: https://windowsontheory.org/2021/04/02/inference-and-statistical-physics/. I did read some of Jaynes' papers, and those were a totally different experience for me: always very succinct and rewarding.

Mentor
I read Jaynes book back in 2000, but didn't come very far.
An unfortunate thing about the book is that it was not finished when Jaynes died. I suspect that if he had lived long enough to finish it, it would be tighter and more like his papers than it is.

Mentor
that paper
I note, btw, that the paper you reference (the one titled "Prior Probabilities") has as its explicit purpose to remove "arbitrariness" in assigning prior probabilities.

His whole book is about figuring out objective rules for the robot to follow for a given problem. He never talks about different robots using different rules for the same problem; he clearly believes that for any given problem, there is one correct set of rules, and that's the set he's looking for.
Not 'there is' but 'there should be'! Jaynes argues about rules robots should follow, rather than the rules they actually follow. Jaynes' rules are normative (desiderata), not descriptive (facts). Moreover, even the rules he gives all depend on the prior probability assignment, which is subjective, according to Jaynes' own testimony on p.44:
Jaynes (my italics) said:
In the theory we are developing, any probability assignment is necessarily ‘subjective’ in the sense that it describes only a state of knowledge, and not anything that could be measured in a physical experiment. Inevitably, someone will demand to know: ‘Whose state of knowledge?’ The answer is always: ‘That of the robot – or of anyone else who is given the same information and reasons according to the desiderata used in our derivations in this chapter.’
Anyone who has the same information, but comes to a different conclusion than our robot, is necessarily violating one of those desiderata.
But reality is so complex that many things require qualitative judgment - something that cannot be formalized since (unlike probability, where there is a fair consensus about the basic rules to apply) there is no agreement among humans about how to judge. This is why different scientists confronted with the same data can come to quite different conclusions. Violating these desiderata is a necessity. I want to have a philosophy of probability (and of quantum physics) that reflects actual practice, not a wish list.
I don't see how two scientists that are both using the exact same Hilbert space for a given quantum tomography experiment could differ in their computation of P(H|X) for any H.
They differ in the results whenever they differ in the prior. The prior is by definition a probability distribution hence subjective = robot-specific (in Jaynes' scenarios), a state of the mind of the robot. No two robots will have the same state of the mind unless they are clones of each other, in every detail that might affect the course of their computations.

It is a truism that the states of the mind of two scientists is far from being the same. Scientists are individuals, not clones.
when you talk about the estimates of density matrix parameters from the data being objective in the sense of all scientists involved agreeing on them. That agreement will only happen if they all follow the same rules in doing their computations.
Agreement only means agreement to some statistical accuracy appropriate for the experiments analyzed. I didn't claim perfect agreement.

In my paper I talk about the standard statistical procedures (non-Bayesian, hence violating the desiderata of Jaynes): Using simple relative frequencies to approximate the probabilities that enter the quantum tomography process, and then solving the resulting set of linear equations. In the case of N independent measurements) the tomography results will have (by the law of large numbers) an accuracy of ##O(N^{-1/2})## with a factor in the Landau symbol depending on the details of the statistical estimation procedure and the rounding errors made. A lot of ingenuity goes into making the factor small enough so that reasonably accurate results are possible for more than the tiniest system, which explains why scientists using the same data but different software will get slightly different results. But the details do not matter to conclude that in principle, i.e., allowing for arbitrarily many experimental and exact computation, the true state (density operator) can be found with as close to certainty as one likes. This makes the density operator an objective property of the stationary quantum beam studied, in spite of the different results that one gets in actual computations. The differences are comparable in nature of the differences one gets when different scientists repeat a precisely defined experiment - measurement results are well-known to be not exact, but what is measured is nevertheless thought of (in the model) as something objective.
If a physicist were to tell you he doesn't agree with your density matrix parameter estimates from quantum tomography data, you would expect him to give some cogent physics reason like he thinks you're using the wrong Hilbert space for the system.
Yes, and the cogent reason is that he uses different software and/or weighted the data differently because of this or that judgment, but gets a result consistent with the accuracies to be expected. There are many examples of scientists measuring tabulated physical constants or properties, and the rule is that different studies arrive at different conclusions. Even when analyzing the same data.

No competent physicist would use a wrong Hilbert space, but there are reasons why someone may choose a different Hilbert space than I did in your hypothetical setting: For efficient quantum tomography you need to truncate an infinite-dimensional Hilbert space by a subspace of very low dimensions, and picking this subspace is a matter of judgment and can be done in multiple defensible ways. Results differ. With time, some methods (and details inside the methods) prove to give more accurate or more robust results, and these become standard until superseded by even better methods.

Quantum chemical calculations of ground state energies of molecules are a well-known example where depending on the accuracy wanted you need to choose different schemes, and results are never exactly reproducible unless you use the same software with the same parameters, and in case of quantum Monte Carlo calculations also the same random number generator and the same seed.

explicit purpose to remove "arbitrariness" in assigning prior probabilities.
Jaynes does not succeed in this. There is a notion of noninformative prior for certain classes of estimation problems, but this gives a good prior only if (from the point of view of a frequentist) it resembles the true probability distribution. (Just as in quantum tomography, 'true' makes sense in cases where one can in principle draw arbitrarily large samples of independent realizations.) The reason is that no probability distribution is truly noninformative, so if you don't have past data (or extrapolate from similar experiences) whatever prior you pick is pure prejudice or hope.

• Fra
Mentor
@A. Neumaier, we clearly have very, very different readings of Jaynes, and this subthread is going well off topic. As far as the specific scenario and methods discussed in your paper, and which you have explained in some detail in your post #68, I don't have anything to add to what I have already said. I certainly am not questioning the overall method of determining quantum density matrix parameters by quantum tomography that you describe.

Sunil
Are you aware of the content of the paper "Quantum mechanics via quantum tomography" that this thread is about?
Yes, I'm aware that this subthread about the consistency of the thinking of those who reject realism and causality in Bell discussions but use in in everyday life and in other scientific questions is already off-topic. I have referred to Jaynes because his use of the "robot" solves a similar problem with this inconsistency of human thinking in another domain - plausible reasoning. Everyday plausible reasoning is vague, and also often inconsistent, and nobody cares about that inconsistency because it is anyway vague.
No robot, no agent, no "subjective mind content of a knower". The meaning of a quantum state resides in what is encoded in (and hence ”known to”) the model! This is almost exactly the opposite of what Jaynes would tell you.
Jaynes is not about "subjective mind of a knower". This sounds like you don't understand the difference between de Finetti's subjective Bayesian interpretation and Jaynes' objective Bayesian interpretation. It is an essential one. Jaynes is about what is rational to conclude given some incomplete information. So, if you have no information about a dice, that means, no information which makes a difference between the numbers, you have to assume the probability ##1/6## for each number. In subjective probability you are free to start with whatever you think is fine. Maybe the probability of 3 is higher because it is a sort of Holy Number? Fine, assign it a higher probability. What is fixed is only the updating. So, computing priors - the probability if there is no information - is meaningless in subjective probability, but is a key in objective probability.

Fra
You declared already that you do not consider inside agents/robots in this way, the only "agent/robot" you consider is the one defined by the "scientific community". This is fine with me.
But just to reflect a bit more on the connection in the light of the following posts...

Jaynes argues about rules robots should follow, rather than the rules they actually follow. Jaynes' rules are normative (desiderata), not descriptive (facts).
...
This is why different scientists confronted with the same data can come to quite different conclusions. Violating these desiderata is a necessity. I want to have a philosophy of probability (and of quantum physics) that reflects actual practice, not a wish list.

Results differ. With time, some methods (and details inside the methods) prove to give more accurate or more robust results, and these become standard until superseded by even better methods.
Tou describe the evolution of the robots(or scientific models if you wish) (ie modifying X)?
This would also suggest a population of robots exhibiting a small variation in X, but those that get in conflict with the "bulk" will likely get destabilised.

In this perspective, in line with your second parahraph above, violating the objective rules is a necessity. And if you think about it, the "objectivity" at the level of your agent (scientific community) is a kind of democracy within the community. Ie. it is not sufficient that a random researcher makes a discover unless it can be reproduced by others etc. This is "agent democracy" condition, it's not a constraint. I don't think anyone would think of consistency among researchers as a constraint, becuase the progression of science requires variation, and thus disagreement.

In return, if we get and asymptotically stable population, one would expect Jayens objectivity to apply to the subset of all indistinguishable robots. (Like we say expect all electrons to behave alike, in similar experiments; we expect all trained physicists to make the same calculations etc). This is when we also see stable rules and laws for each robot as per the classification.

This to me, is the actual practice in science, so I would (for the reasons you also mention) prefer to include the variation of X also the philosophy of inference and physics? This is where where my motivation ends.
I have failed to see a simpler way forward.

/Fredrik