Do you understand how he can do that?
I undestand the formulas for retrodictive probabilities but what about his task? What does he do? Does he send results to Alice?

Alice sends a mixture of state preparations Si with probabilities Pi (a density matrix ##\rho##)
As a density matrix is something which can be measured Bob has a procedure to compute the density matrix (ans the Pi s)
Now he has a setup which enables him to do POVMs. If Mk is one of the possible outcomes, he applies the POVM on the only particles out of Mk output and do not record the others. If he measures the density matrix with the initial procedure, he will get another density matrix ##\rho '## with other probabilities P'i.
The P'i are the retrodicted probabilities of the preparations Si given Mk.

The following is the only axiom required:
An observation/measurement with possible outcomes i = 1, 2, 3 ..... is described by a POVM Ei such that the probability of outcome i is determined by Ei, and only by Ei, in particular it does not depend on what POVM it is part of.

Cant see how you could start from anything more basic - but I may be wrong.

You can't measure a state - either a mixture (ie a density matrix) or a pure state.

Again this follows from the Born rule.

For simplicity start with a pure state. We have some observation and it has a particular outcome. You can't infer anything from that outcome other than the state was in a superposition of that outcome. A mixed state doesn't change that except you don't know what pure state was presented for measurement.

Added Later
Had a quick squiz. He is not measuring the state. What he is doing is carrying out an observation and using that to update knowledge of the state that was sent. You can do that - eg you know then that the state was for sure in a superposition with that outcome.

What the paper seems to be doing is giving a Baysian take on quantum measurement. Whether such is worthwhile - blowed if I know.

I find the single axiom used in Bush's proof quite elegant. The three axioms he mentions are all simple consequences of the single axiom eg f(I) = 1 follows from the fact since a POVM contains just one element the probability of that outcome is 1 - that's the law of total probability (it's actually a bit more general than that concerning conditional probabilities) or simply an easy consequence of the Kolmogorov axioms.

I disagree with the statement that a density matrix cannot be measured.
It can be measured as the probability distribution of a biased dice.
Read Oregon

I think that this procedure may be used with the post selected particles out of one of the outputs.
Does not it gives the density matrix given this outcome? (retrodictive point of view)

In any theorem its always important to muck around with the proof and come up with your own. Its the only way to really understand it. You find theorems devised by those knowing a lot of background math can make use of high powered machinery unnecessarily when a simple argument suffices. That's all that was going on here. Its a well known consequence of Von Neumann's proof against hidden variables so its simply a matter of knowing its essence which I recalled from looking into that ages ago. The key to any derivation of Born's rule via a way similar to Gleason (ie assuming non contextuality), is to show linearity - which is what Gleason and Bush's variant does. In fact its so easy with that assumption I often am scratching my head why people say its a rule by itself independent of the observable axiom. That's true of course, but linearity seems less pulled out of a hat than the rule it determines. And it provides the opportunity to explain where Von Neumann went wrong. He gave an operational definition of expectations that makes linearity seem natural - but fails for hidden variables.

Normally measuring a property of something means you are given that something and measure it in some way. Its not normally meant you are given a whole heap of similarly prepared 'somethings' and measure statistical properties. The state is a statistical property and that's the only way to do it.

When dealing with this sort of stuff precision is important - and very hard to do BTW - I am far from perfect in that regard myself

My take was the paper is simply espousing the Bayesian view where you take a whole heap of similarly prepared systems and update the state with each 'measurement' until you get an acceptable confidence level.

I do not believe that with a great amount of equal paricle's wave functions, i will be able to measure their phase.
Things are very different with their density matrix.
A density matrix is not less measurable than an interfringe length.
I have long wondered if Von Neumann entropy was just a math tool or if it could be measured.
When i read the previous link i was happy to know it is also measurable.

Multiplying a pure state by a phase factor makes no difference |cu><cu| = |u><u|. This is their well known gauge freedom.

It is only a phase relative to another phase that is detectable.

E(O) = Trace (P1 O) = Trace (P2 O) ie Trace ((P1 - P2) O) = 0 ie P1 = P2 if O is chosen reasonably eg O = ∑ i |bi><bi|. Observe a large number of similarly prepared systems with this observable then the expected value uniquely determines the systems state.