What is the relation between probability spaces and the binomial distribution?

Rasalhague
Messages
1,383
Reaction score
2
Here, to further test my understanding, is an attempt to apply the measury theory definitions of a probability space to the binomial distribution. All comments welcome!


Let (R,D,O) be a probability space:

R = \left \{ 0,1 \right \}

D = 2^R

O:D\rightarrow[0,1] \; | \; O(\left \{ 1 \right \})=p


Let (S,E,P) be another probability space:

S = \left \{ 0,1 \right \}^n

E = 2^S

P:E\rightarrow[0,1] \; | \; P(\left \{ s \right \})=p^n


Let (T,F,Q) be a third probability space:

T=\left \{ 0,1,...,n \right \}

F = 2^T

Q:F\rightarrow[0,1] \; | \; Q(\left \{ t \right \})= \binom{n}{t}p^t(1-p)^{n-t}


Let X be a random variable:

X:S\rightarrow T \; | \; X(s)=\sum_{i=1}^{n}s_i


Then the probability measure Q belongs to a class of (probability) distributions called binomial distributions. Its sample space is S. The events are E. The probability is P. The observation space is T. The "observed events" are F. We can interpret p as the likelihood of success on one trial, n as the number of trials, and t as the likelihood of exactly t successes in n trials.

The components of (R,D,O) have no special name, but if we define another random variable, W, such that W is the identity function on R, then O becomes the Bernoulli distribution. Equivalently, the Bernoulli distribution is a binomial distribution with n = 1.

Footnote: I think there's a more subtle variant of this idea, which I hope to get to eventually, where the observation space is taken to be the real numbers, and F the Borel algebra (smallest sigma algebra generated by the open sets), allowing one to use general formulas for defining expectation, and so forth, that apply both to continuous and discrete cases.
 
Physics news on Phys.org
Yes, all you said is correct! :smile:

Maybe I should comment a bit on the general theory. In general, we'll indeed have a sample space (S,E,P). The random variables are functions X:S\rightarrow \mathbb{R}. This function can be discrete or continuous or it can be anything. We only want the function to be measurable, that is: X^{-1}(B)\in E for every Borel set B. Measurability is a technical concept that is almost always satisfied in practise.

Because of measurability, we can define a distribution by

P_X(B)=P(X^{-1}(B))

Now, the mean is defined in general as

\int{XdP}=\int{xdP_X}

So we are actually integrating with respect to a probability measure! If the measure is discrete, then the integral is a sum, if the measure is "continuous", then the integral is the integral like we all know it! :smile:
 
Phew, that's a relief! Thanks for all the help and encouragement, micromass.

By the way, what exactly is the formal definition of discrete? Of a random variable to R, does it perhaps mean one whose range contains a (finite or infinitely) countable set of nonzero values, or is it defined in terms of continuity? Is there a definition of equal generality to the definition of a random variable itself, i.e. a definition which works for any random variable, no matter what its domain and codomain are? And how is discrete defined for a measure? Does a discrete random variable necessarily induce a discrete distribution, and is a discrete distribution necessarily induced by a discrete random variable?

I guess those scare quotes around "continuous" are because there isn't, in general, a topology defined on either of the sigma algebras, E or F. Is there a more technically correct way of expressing this difference between models that involve things like binomial distributions, and models that involve things like normal distributions?
 
Rasalhague said:
Phew, that's a relief! Thanks for all the help and encouragement, micromass.

By the way, what exactly is the formal definition of discrete? Of a random variable to R, does it perhaps mean one whose range contains a (finite or infinitely) countable set of nonzero values, or is it defined in terms of continuity? Is there a definition of equal generality to the definition of a random variable itself, i.e. a definition which works for any random variable, no matter what its domain and codomain are?

A random variable is discrete if it's range is finite or countable. It's as easy as that.

And how is discrete defined for a measure?

It isn't. I should not have written that. I should have said "if X is discrete, then it is a sum. If X is "continuous", then it is an integral"

Does a discrete random variable necessarily induce a discrete distribution, and is a discrete distribution necessarily induced by a discrete random variable?

Yes. And I dare to say that it is by definition. But this depends of the definition.

I guess those scare quotes around "continuous" are because there isn't, in general, a topology defined on either of the sigma algebras, E or F. Is there a more technically correct way of expressing this difference between models that involve things like binomial distributions, and models that involve things like normal distributions?

There are different answers to this. One convention says that X is continuous if the cdf

F(x)=P\{X\leq x\}

is continuous.

A (stronger) convention says that X is continuous if the distribution PX is an "absolute continuous measure". All that means is that there exists an integrable function f (called the probability density function or pdf) such that

P_X(A)=\int_A{f(x)dx}

where the integral above is the ordinary integral that you're used to (in more technical terms: the integral with respect to Lebesgue measure).
 
micromass said:
One convention says [...]

When S and T are topological spaces, do these notions of "continuity" coincide with the usual one? Is the need for a convention only because these are extensions of the usage of the word to cases where S and T are not both topological spaces, or is there ever a case where X can be continuous in the usual sense but not in the probability sense, or vice-versa?
 
Rasalhague said:
When S and T are topological spaces, do these notions of "continuity" coincide with the usual one? Is the need for a convention only because these are extensions of the usage of the word to cases where S and T are not both topological spaces, or is there ever a case where X can be continuous in the usual sense but not in the probability sense, or vice-versa?

No, saying that a random variable is continuous has nothing to do with the topology on S or T. I can find a lot of topologies on S, but that doesn't mean that X is necessarily continuous in the usual sense. The terminology of saying that X is continuous is historically invented to mean that the cdf is continuous, and that's still the meaning of it today.
 
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Thread 'Detail of Diagonalization Lemma'
The following is more or less taken from page 6 of C. Smorynski's "Self-Reference and Modal Logic". (Springer, 1985) (I couldn't get raised brackets to indicate codification (Gödel numbering), so I use a box. The overline is assigning a name. The detail I would like clarification on is in the second step in the last line, where we have an m-overlined, and we substitute the expression for m. Are we saying that the name of a coded term is the same as the coded term? Thanks in advance.
Back
Top