# On the myth that probability depends on knowledge

2019 Award
http://en.wikipedia.org/wiki/Bayesian_probability "Bayesian probability interprets the concept of probability as 'a measure of a state of knowledge', in contrast to interpreting it as a frequency or a 'propensity' of some phenomenon."

As I said before, a Bayesian definition of probability does depend on knowledge. I don't know why you bother asserting the contrary when it is such a widely-known definition of probability.
As Wikipedia says, the above is a particular _interpretation_, not a _definition_ of probability. If you'd take it as a definition, you'd not be able to derive the slightest thing
from it.

The subjective interpretation may be legitimate to guide actions, but it is not science.

I have been using successfully Bayesian methods without this concept of Bayesian probability, in an objective context.

Fra
So you'd say that a program that receives a continuous stream of data, uses it to make and store some statistics of it (not the data themselves, which are never looked at by anyone/anything except this program), and then spits out a prediction of a probability for the Dow Jones index to be above some threshold at a fixed date knows about the stock market?
In the obviously restricted sense yes.

The big difference is that the action space of a computer, is largely constrained. A computer can not ACT upon it's information in the same way a human can. The computer can at best print on the screen, buy or sell recommendations. But since computer the feedback to programs and computers are different. A computer program that makes good predictions gets to live. Bad programs are deleted. In theory howerver, one can imagine an AI system that uses the feedback from stock market business to secure it's own existence. Then Systems that fail to learn will die out, good learners are preferred.

So the analogy is different just because the state and action space of a "classical normal computer" IS fixed, at least in the context we refer to it here, as an abstraction. A general system in nature, does not have a fixed state or action space. This is exactly how learning works. "artificial" intelligence with preprogrammed strategies and selections fails to be real intelligence just becase there is no feedback to revise and evolve the action space. Some self-modifying algorithms can partly do this but it's still living in a givne computer.

This is ni principle not different from how the cellullar based complex biological system wel call human brain can ENCODE and know about stock market. The biggest different is that of complexity, and the flexibility of state and action spaces.

The actions possible for a computer is VERY constrained, beucase it's how it's built.

/Fredrik

2019 Award
I am still stuck on the concept that you can't make meaningful statements about the probabilities of single events. What about the following scenario:

1) you have a group of 2 atoms of isotope A, with 5 second half-life
2) you have a group of 2 atoms of isotope B, with 5 year half-life

What is the probability that one of the A atoms will decay before one of the B atoms?

From posts Arnold Neumaier has made on this thread, it seems he will say that the question as I have phrased it above is not scientifically meaningful. If this is true (i.e. Arnold does think that it is meaningless, and I have not misunderstood something, then please answer the following question:

How big do I have to make the pools (5 atoms, 5000 atoms, 5x10^23 atoms) before the question DOES become scientifically meaningful? Because if I have not misunderstood, other statements Prof. Neumaier has made on this thread indicate that he *does* think scientifically meaningful statements can be made about probabilities of events from "large ensembles", so it seems that at some point, the pools must reach a critical size where "statistical significance" (or whatever the proper term is) is achieved.
In general, if you have a complete specification an ensemble, you can derive scientific statements about anonymous members of the ensemble.

This is the case e.g., when analysing past data. You can say p% of the population of the US in the census of year X earned above Y Dollars.

It is also the case when you have a theoretical model defining the ensemble. You can say the probability to cast an even number with a perfect die is 50%, since the die is an anonymous member of the theoretical ensemble. But you cannot say anything about the probability of casting an even number in the next throw at a particular location in space and time, since this is an ensemble of size 1 - so the associated probabilities are provably 0 or 1.

In practice, interest is mainly in the prediction of incompletely specified ensembles.
In this case, the scientific practice is to replace the intended ensemble by a theoretical model of the ensemble, which is precisely known once one estimates its parameters from the available part of the ensemble, using a procedure that may also depend on other assumptions such as a prior (or a class of priors whose parameters are estimated as well).

In this case, all computed/estimated probabilities refer to this theoretical (often infinitely large) ensemble, not to a particular instance. (From a mathematical point of view, ensemble = probability space, the sample space being the set of all realizations of the ensemble.)

Now there is a standard way to infer from the model statements about the intended ensemble: One specifies one 's assumptions going into the model (such as independence assumptions, Gaussian measure assumptions, etc.), the method of estimating the parameters from the data, and a confidence level deemed adequate, and which
statistical tests are used to check the confidence level for a particular prediction in a particular situation. Then one makes a definite statement about the prediction
(such as ''this bridge is safe for crossing by trucks up to 10 tons'') accompanied perhaps by mentioning the confidence level. The definite statement satisfies the scientific standards of derivation and is checkable. It may still be right or wrong - this is in the nature of scientific statements.

If a method of prediction and assessment of confidence leads to wrong predictions significantly higher than the assigned confidence level the method will be branded as unreliable and phased out from scientific practice. Note that this again requires an ensemble - i.e., many predictions to be implementable. Again, a confidence level for a single prediction may serve only as a subjective guide.

The statement ''Isotope X has a half life of Y years'' is a statement about the ensemble
of all atoms representing isotope X. A huge subensemble of the still far huger full ensemble has been observed, so that we know the objective value of Y quite well, with
a very small uncertainty,, and we also know the underlying model of a Poisson process.

If we now have a group of N atoms of isotope X, we can calculate from this information
a confidence interval for any statement of the form ''In a time interval T, between M-K and M+K of the N atoms will decay''. If the confidence is large enough we can state it as a prediction that in the next experiment checking this, this statement will be found correct. And we were entitled to publish it if X was a new or interesting isotope whose decay was measured by a new method, say.

Nowhere in all I said was any reference made to a "a measure of a state of knowledge", so that the ''Bayesian probability interpretation'' as defined in http://en.wikipedia.org/wiki/Bayesian_probability is clearly inapplicable.

if i created a device to drop a coin the same exact way each time, and i put the coin in heads up each time, the first drop would presumably be the only drop with a probability of 50-50. it seems the knowledge of that outcome would effect the probability of every other drop. please help me out if my thinking is flawed.

Dale
Mentor
As Wikipedia says, the above is a particular _interpretation_, not a _definition_ of probability.
Now you want to take a semantic debate about the word "probability" and add a semantic debate about the word " definition".

The point is that it is perfectly well-accepted to consider probability to depend on knowledge. It is not a myth. Your continued refusal to recognize this obvious fact makes you seem irrational and biased. How can anyone reason or debate with someone who won't even acknowledge commonly accepted meanings of terms?

2019 Award
if i created a device to drop a coin the same exact way each time, and i put the coin in heads up each time, the first drop would presumably be the only drop with a probability of 50-50. it seems the knowledge of that outcome would effect the probability of every other drop. please help me out if my thinking is flawed.
If your device were deterministic, and you were able to replicate things with infinite precision, the later outcomes would be the same as the first one. But neither of these assumptions can be realized.

2019 Award
Now you want to take a semantic debate about the word "probability" and add a semantic debate about the word " definition".

The point is that it is perfectly well-accepted to consider probability to depend on knowledge. It is not a myth. Your continued refusal to recognize this obvious fact makes you seem irrational and biased. How can anyone reason or debate with someone who won't even acknowledge commonly accepted meanings of terms?
You seem to imply that semantics is irrelevant for meaning.

I never saw anyone before equating interpretation with definition. They are worlds apart.

And about the semantics of myth:

from http://en.wikipedia.org/wiki/Myth :
Many scholars in other fields use the term "myth" in somewhat different ways. In a very broad sense, the word can refer to any traditional story.
from http://en.wikipedia.org/wiki/National_myth :
A national myth is an inspiring narrative or anecdote about a nation's past. Such myths often serve as an important national symbol and affirm a set of national values.
Thus something may be well accepted and still be a myth.

If your device were deterministic, and you were able to replicate things with infinite precision, the later outcomes would be the same as the first one. But neither of these assumptions can be realized.
i'm just using a cheap chute and a pencil. 9 out of ten times its heads, so far. that one tails, does that set the odds back to 50-50? even though the results say 90% heads. would an observer with no knowledge have a 50-50 chance?

2019 Award
i'm just using a cheap chute and a pencil. 9 out of ten times its heads, so far. that one tails, does that set the odds back to 50-50? even though the results say 90% heads. would an observer with no knowledge have a 50-50 chance?
It depends on whether you think in terms of subjective or objective probability.

The objective probability is independent of how much an observer knows, and can be determined approximately from sufficiently many experiments. To someone who knows none or only few experimental outcomes, the objective probability will be unknown rather than 50-50.

The subjective probability depends on the prejudice an observer has (encoded in the prior) and the amount of data (which modify the prior), so it may well be 50-50 for an observer with no knowledge.

Dale
Mentor
You seem to imply that semantics is irrelevant for meaning.

I never saw anyone before equating interpretation with definition. They are worlds apart.

And about the semantics of myth:

from http://en.wikipedia.org/wiki/Myth :

from http://en.wikipedia.org/wiki/National_myth :

Thus something may be well accepted and still be a myth.
You are clearly not a reasonable person to discuss with. No progress can be made in such a conversation.

It depends on whether you think in terms of subjective or objective probability.

The objective probability is independent of how much an observer knows, and can be determined approximately from sufficiently many experiments. To someone who knows none or only few experimental outcomes, the objective probability will be unknown rather than 50-50.

The subjective probability depends on the prejudice an observer has (encoded in the prior) and the amount of data (which modify the prior), so it may well be 50-50 for an observer with no knowledge.
your saying there was only one outcome objectively, even though i couldnt be certain. so subjectively i had 2 choices, and then one choice for each successive drop?

2019 Award
your saying there was only one outcome objectively, even though i couldnt be certain. so subjectively i had 2 choices, and then one choice for each successive drop?
Objectively, the odds seem to be close to 90-10, according to your description, though I don't know whether your sample was large enough to draw this conclusion with some confidence.

Subjectively, it depends on what you are willing to substitute for your ignorance.

If _I_ were the subject and had no knowledge, I'd defer judgment rather than assert an arbitrary probability. This is the scientifically sound way to proceed.

the knowledge has no effect on the probability of the outcome, just probable correct answers. i think i got it. i guess i agree with you then.

Quantum mechanics has demonstrated that what we do not know can arise from what we cannot know. Information that parts of a system can have about other parts of a system is not really separate from the systems themselves. We have to stop pretending to be omniscient.

I have just now been introduced to probability theory by Jaynes, and the way he described probability (as a tool for prediction), it definitely depends on information. I suppose that what you call "probability", is what he might have called statistical "frequency".

Thus it is "just" a matter of words and definition, but, as I just discovered, it's an important one and you are right to bring it up!

Jaynes argues, or in fact he shows, that quite some paradoxes (incl. in QM such as Bell's) result from confusions between, on the one hand:
- our probabilistic inferences and predictions based on the information that we have,
and on the other hand:
- the effects and statistics of physical measurements that allow to verify those predictions.

Harald

Last edited:
2019 Award
I have just now been introduced to probability theory by Jaynes, and the way he described probability (as a tool for prediction), it definitely depends on information. I suppose that what you call "probability", is what he might have called statistical "frequency".

Thus it is "just" a matter of words and definition, but, as I just discovered, it's an important one and you are right to bring it up!
Jaynes propbabilities are subjective, then the dependence on knowledge is appropriate.
When he applies it to statistical mechanics, though, he gets the right results only if he assumes the right sort of knowledge, namely those of the additive conserved quantities. Would someone apply his max entropy principle using onlz knowledge about the expectation of the square of H, say, he would get very wrong formulas.

Thus one needs to know the correct formulas to know which sort of information one may use as input to his subjective approach....

For a detailed discussion, see Sections 10.6 and 10.7 of my book

Arnold Neumaier and Dennis Westra,
Classical and Quantum Mechanics via Lie algebras,
2008, 2011. http://lanl.arxiv.org/abs/0810.1019

Jaynes propbabilities are subjective, then the dependence on knowledge is appropriate.
When he applies it to statistical mechanics, though, he gets the right results only if he assumes the right sort of knowledge, namely those of the additive conserved quantities. Would someone apply his max entropy principle using onlz knowledge about the expectation of the square of H, say, he would get very wrong formulas.

Thus one needs to know the correct formulas to know which sort of information one may use as input to his subjective approach....

For a detailed discussion, see Sections 10.6 and 10.7 of my book

Arnold Neumaier and Dennis Westra,
Classical and Quantum Mechanics via Lie algebras,
2008, 2011. http://lanl.arxiv.org/abs/0810.1019
It appears to me that what you call "subjective" is what he called "objective"; and of course any prediction is based on certain assumptions (theories that are based on human knowledge). Anyway, thanks for the link - and if you want to call a prediction based on QM, "subjective", then that's fine to me.