JesseM said:
It's still not clear what you mean by "the marginal probability of successful treatment".
billschnieder said:
A = Treatment A results in recovery from the disease
P(A) = marginal probability of recovery after administration of treatment A.
If it is the meaning of marginal probability you are unsure of, this will help (
http://en.wikipedia.org/wiki/Conditional_probability)
I think you know perfectly well that I understand the difference between marginal and conditional as we have been using these terms extensively. It often seems like you may be intentionally playing one-upmanship games where you snip out all the context of some question or statement I ask and make it sound like I was confused about something very trivial...in this case the context made clear exactly what I found ambiguous in your terms:
For example, take my example where the actual experiment was done by sampling patients whose treatments had not been assigned randomly, but had been assigned by their doctors. In this case there might be a systematic bias where doctors are more likely to assign treatment A to patients with large kidney stones (because these patients have more severe symptoms and A is seen as a stronger treatment) and more likely to assign treatment B to patients with small ones. If we imagine repeating this experiment a near-infinite number of times with the same experimental conditions, then those same experimental conditions would still involve the same set of doctors assigning treatments to a near-infinite number of patients, so the systematic bias of the doctors would influence the final probabilities, and thus the "marginal probability of recovery with treatment B" would be higher because patients who receive treatment B are more likely to have small kidney stones, not because treatment B is causally more effective. On the other hand, if we imagine repeating a different experiment that adequately controls for all other variables (in the limit as the sample size approaches infinity), like one where the patients are randomly assigned to treatment A or B, then in this case the "marginal probability of recovery with treatment A" would be higher. So in this specific experiment where treatment was determined by the doctor, which would you say was higher, the marginal probability of recovery with treatment A or the marginal probability of recovery with treatment B? Without knowing the answer to this question I can't really understand what your terminology is supposed to mean.
This scenario, where there is a systematic bias in how doctors assign treatment which influences the observed correlations in frequencies between treatment and recovery in the sample, is a perfectly well-defined one (in fact it's exactly the one assumed in the wikipedia page on Simpson's paradox), so if your terms are well-defined you should be able to answer the question about whether treatment A or treatment B has a higher "marginal probability of successful treatment" in this particular scenario. So please answer it if you want to continue using this type of terminology.
In general I notice that you almost always refuse to answer simple questions I ask you about your position, or to address examples I give you, while you have no problem coming up with examples and commanding me to address them, or posing questions and then saying "answer yes or no". Again it seems like this may be a game of one-upmanship here, where you refuse to address anything I ask you to, but then forcefully demand that I address examples/questions of yours, perhaps to prove that you are in the "dominant" position and that I "can't tell you what to do". If you are playing this sort of macho game, count me out, I'm here to try to have an intellectual discussion which gets at the truth of these matters, not to prove what an alpha male I am by forcing everyone to submit to me. I will continue to make a good-faith effort to answer your questions and address your examples, as long as you will extend me the same courtesy (not asking you to answer every sentence of mine with a question mark, just the ones I specifically/repeatedly request that you address); but if you aren't willing to do this, I won't waste any more time on this discussion.
billschnieder said:
Probability means "Rational degree of belief" defined in the range from 0 to 1 such that 0 means uncertain and 1 means certain.
"Rational degree of belief" is a very ill-defined phrase. What procedure allows me to determine the degree to which it is rational to believe a particular outcome will occur in a given scenario?
billschnieder said:
Probability does not mean frequency, although probability can be calculated from frequencies.
You seem to be unaware of the debate surrounding the meaning of "probability", and of the fact that the "frequentist interpretation" is one of the most popular ways of defining its meaning. I already linked you to the wikipedia article on
frequency probability which starts out by saying:
Frequency probability is the interpretation of probability that defines an event's probability as the limit of its relative frequency in a large number of trials. The development of the frequentist account was motivated by the problems and paradoxes of the previously dominant viewpoint, the classical interpretation. The shift from the classical view to the frequentist view represents a paradigm shift in the progression of statistical thought.
Under the wikipedia article on the
classical interpretation they say:
The classical definition of probability was called into question by several writers of the nineteenth century, including John Venn and George Boole. The frequentist definition of probability became widely accepted as a result of their criticism, and especially through the works of R.A. Fisher.
Aside from wikipedia you might look at the
Interpretations of Probability article from the Stanford Encyclopedia of Philosophy. In the
section on frequency interpretations they start by discussing "finite frequentism" which just defines probability in terms of frequency on some finite number of real trials, so if you flip a coin 10 times and get 7 heads that would automatically imply the "probability" of getting heads was 0.7. This interpretation has some obvious problems, so that leads them to the meaning that
I am using when I discuss "ideal probabilities", known as "infinite frequentism":
Some frequentists (notably Venn 1876, Reichenbach 1949, and von Mises 1957 among others), partly in response to some of the problems above, have gone on to consider infinite reference classes, identifying probabilities with limiting relative frequencies of events or attributes therein. Thus, we require an infinite sequence of trials in order to define such probabilities. But what if the actual world does not provide an infinite sequence of trials of a given experiment? Indeed, that appears to be the norm, and perhaps even the rule. In that case, we are to identify probability with a hypothetical or counterfactual limiting relative frequency. We are to imagine hypothetical infinite extensions of an actual sequence of trials; probabilities are then what the limiting relative frequencies would be if the sequence were so extended.
The article goes on to discuss the idea that this infinite series of trials should be defined as ones that all share some well-defined set of conditions, which Von Mises called "
collectives — hypothetical infinite sequences of attributes (possible outcomes) of specified experiments that meet certain requirements ... The probability of an attribute A, relative to a collective ω, is then defined as the limiting relative frequency of A in ω."
There are certainly other interpretations of probability, discussed in the article (you can find more extensive discussions of different interpretations in a book like
Philosophical Theories of Probability--much of the chapter on the frequentist interpretation can be read on google books
here). I think most of them would be difficult to apply to Bell's reasoning though. The more subjective definitions would have the problem that you'd have trouble who is supposed to be the "subject" that defines probabilities dealing with λ (whose value on each trial, and even possible range of values, would be unknown to human experimenters). And the more "empirical" definitions which deal only with frequencies in actual observed trials would have the same sort of problem, since we don't actually observe the value of λ.
Anyway, do you think there is anything inherently incoherent about using the frequentist interpretation of probability when following Bell's reasoning? If so, what? And if you prefer a different interpretation of the meaning of "probability", can you give a definition less vague than "rational degree of belief", preferably by referring to some existing school of thought referred to in an article or book?
billschnieder said:
Probabilities can be assigned for many situations that can never be repeated.
But the frequentist interpretation is just about
hypothetical repetitions, which can include purely hypothetical ideas like "turning back the clock" and running the same single experiment over again at the same moment (with observable conditions held the same but non-observed conditions, like the precise 'microstate' in a situation where we have only observed the 'macrostate', allowed to vary randomly) rather than actually repeating it at successively later times (which might be impossible because the original experiment destroyed the object we were experimenting on, say).
billschnieder said:
The domain of probability theory is to deal with uncertainty, indeterminacy and incomplete information.
Yes, and the idea is that we are considering a large set of trials in which the things we
know are the same in every trial (like the 'macrostate' in statistical mechanics which just tells us the state of macro-variables like temperature and pressure) but the things we don't know vary randomly (like the 'microstate' in statistical mechanics which deals with facts like the precise position of every microscopic particle in the system). In classical statistical mechanics the "probability" that a system with a given macrostate at t
0 will evolve to another given macrostate at t
1 is determined by considering
every possible microstate consistent with the original macrostate at t
0 (the number of possible microstates for any human-scale system being astronomically large) and seeing what fraction will evolve into a microstate at t
1 which is consistent with the macrostate whose probability we want to know. So here we are considering a situation in which we only know some limited information about the system, and are figuring out the probabilities by considering a near-infinite number of possible trials in which the
unknown information (the precise microstate) might take many possible values. Do you think this is an improper way of calculating probabilities? It does seem to be directly analogous to how Bell was calculating the probabilities of seeing different values of
observable variables by summing over all possible values of the hidden variables.
billschnieder said:
As such it makes not much sense to talk of "true probability".
It does in the frequentist interpretation.
JesseM said:
For example, take my example where the actual experiment was done by sampling patients whose treatments had not been assigned randomly, but had been assigned by their doctors. In this case there might be a systematic bias where doctors are more likely to assign treatment A to patients with large kidney stones (because these patients have more severe symptoms and A is seen as a stronger treatment) and more likely to assign treatment B to patients with small ones. If we imagine repeating this experiment a near-infinite number of times with the same experimental conditions, then those same experimental conditions would still involve the same set of doctors assigning treatments to a near-infinite number of patients, so the systematic bias of the doctors would influence the final probabilities, and thus the "marginal probability of recovery with treatment B" would be higher because patients who receive treatment B are more likely to have small kidney stones, not because treatment B is causally more effective.
billschnieder said:
So you agree that one man's marginal probability is another man's conditional probability.
The comment above says nothing of the sort. I'm just saying that to talk about "probability" in the frequentist interpretation you need to define the conditions that you are imagining being repeated in an arbitrarily large number of trials. And in the case above, the conditions include the fact that on every trial the treatment was assigned by a member of some set of doctors, which means that the marginal probability of (treatment B, recovery) is higher than the marginal probability of (treatment A, recovery) despite the fact that treatment B is not causally more effective (and I'm asking you whether in this scenario you'd say treatment B is 'marginally more effective', a question you haven't yet answered). Nowhere in the above am I saying anything about conditional probabilities.
Even if you don't want to think of probabilities in frequentist terms, would you agree that whenever we talk about "probabilities" we at least need to define a sample space (or probability space, which is just a sample space with probabilities on each element) which includes the conditions that could obtain on any possible trial in our experiment? If so, would you agree that when defining the sample space, we must define what process was used to assign treatments to patients, that a sample space where treatment was assigned by doctors would be a different one than a sample space where treatment was assigned by a random number generator on a computer?
billschnieder said:
Which is the point I've been pointing out to you Ad-nauseam. Comparing probabilities defined on different probability spaces is guaranteed to produce paradoxes and spooky business.
I'm not asking you to "compare probabilities defined on different probability spaces", and Bell's argument doesn't require you to do that either. I'm just asking, for the probability space I outlined where treatments would be decided by doctors, whether you would say treatment B was "marginally more effective" if it turned out that the probability (or frequency) of (treatment B, recovery) was higher than the probability of (treatment A, recovery).
billschnieder said:
This is the point you still have not understood. It is not possible to control for "all other variables" which you know nothing about, even if it were possible to repeat the experiment an infinite number of times.
Sure it would be. If treatment was assigned by a random number generator, then in the limit as the number of trials went to infinity the probability of any correlation between traits of patients prior to treatment (like large kidney stones) and the treatment they were assigned would approach 0. This is just because there isn't any way the traits of patients would causally influence the random number generator so that there would be a systematic difference in the likelihood that patients with different versions of a trait (say, large vs. small kidney stones) would be assigned treatment A vs. treatment B. Do you disagree?
And again, if we are talking about Bell's argument it
doesn't matter if there is such a correlation between the value of the hidden variable λ and the value of some measurable variable like A, you don't need to "control for" the value of the hidden variable in the sense you need to "control for" the value of a background variable like S={large kidney stones, small kidney stones} above. This is because the only need for that type of control is if you want to establish a causal relation between measurable variables like treatment and recovery, but Bell is
not trying to establish a causal relation between spacelike-separated measurement outcomes, quite the opposite in fact. If you disagree it would help if you would respond to
post #79 (you might not have even noticed that one because it was on an earlier page from my next post to you, #91, which you were responding to here), particularly the question I was asking here (which only requires a yes-or-no answer):
So, do you agree with my statement that of these two, Only the second sense of "fair sample" is relevant to Bell's argument?
To make the question more precise, suppose all of the following are true:
1. We repeat some experiment with particle pairs N times and observe frequencies of different values for measurable variables like A and B
2. N is sufficiently large such that, by the law of large numbers, there is only a negligible probability that these observed frequencies differ by more than some small amount \epsilon from the ideal probabilities for the same measurable variables (the 'ideal probabilities' being the ones that would be seen if the experiment was repeated under the same observable conditions an infinite number of times)
3. Bell's reasoning is sound, so he is correct in concluding that in a universe obeying local realist laws (or with laws obeying 'local causality' as Maaneli prefers it), the ideal probabilities for measurable variables like A and B should obey various Bell inequalities
...would you agree that if all of these are true (please grant them for the sake of the argument when answering this question, even though I know you would probably disagree with 3 and perhaps also doubt it is possible in practice to pick a sufficiently large N so that 2 is true), then the experiment constitutes a valid test of local realism/local causality, so if we see a sizeable violation of Bell inequalities in our observed frequencies there is a high probability that local realism is false? Please give me a yes-or-no answer to this question.
If you say yes, it would be a valid test if 1-3 were true but you don't actually believe 2 and/or 3 could be true in reality, then we can focus on your arguments for disbelieving either of them. For example, for 2 you might claim that if N is not large enough that the frequencies of hidden-variable states are likely to match the ideal probabilities for these states (because the number of hidden-variable states can be vastly larger than any achievable N), then that also means the frequencies of values of observable variables like A and B aren't likely to match the ideal probabilities for these variables either. I would say that argument is based on a misconception about statistics, and point you to the example of the coin-flip-simulator and the more formal textbook equation in post #51 to explain why. But again, I think it will help focus the discussion if you first address the hypothetical question about whether we would have a valid test of local realism if 1-3 were all true.