What is the relation between probability spaces and the binomial distribution?

In summary: Another convention says that X is continuous if it's cdf is a power series with an infinite term at the end.
  • #1
Rasalhague
1,387
2
Here, to further test my understanding, is an attempt to apply the measury theory definitions of a probability space to the binomial distribution. All comments welcome!


Let (R,D,O) be a probability space:

[tex]R = \left \{ 0,1 \right \}[/tex]

[tex]D = 2^R[/tex]

[tex]O:D\rightarrow[0,1] \; | \; O(\left \{ 1 \right \})=p[/tex]


Let (S,E,P) be another probability space:

[tex]S = \left \{ 0,1 \right \}^n[/tex]

[tex]E = 2^S[/tex]

[tex]P:E\rightarrow[0,1] \; | \; P(\left \{ s \right \})=p^n[/tex]


Let (T,F,Q) be a third probability space:

[tex]T=\left \{ 0,1,...,n \right \}[/tex]

[tex]F = 2^T[/tex]

[tex]Q:F\rightarrow[0,1] \; | \; Q(\left \{ t \right \})= \binom{n}{t}p^t(1-p)^{n-t}[/tex]


Let X be a random variable:

[tex]X:S\rightarrow T \; | \; X(s)=\sum_{i=1}^{n}s_i[/tex]


Then the probability measure Q belongs to a class of (probability) distributions called binomial distributions. Its sample space is S. The events are E. The probability is P. The observation space is T. The "observed events" are F. We can interpret p as the likelihood of success on one trial, n as the number of trials, and t as the likelihood of exactly t successes in n trials.

The components of (R,D,O) have no special name, but if we define another random variable, W, such that W is the identity function on R, then O becomes the Bernoulli distribution. Equivalently, the Bernoulli distribution is a binomial distribution with n = 1.

Footnote: I think there's a more subtle variant of this idea, which I hope to get to eventually, where the observation space is taken to be the real numbers, and F the Borel algebra (smallest sigma algebra generated by the open sets), allowing one to use general formulas for defining expectation, and so forth, that apply both to continuous and discrete cases.
 
Physics news on Phys.org
  • #2
Yes, all you said is correct! :smile:

Maybe I should comment a bit on the general theory. In general, we'll indeed have a sample space (S,E,P). The random variables are functions [itex]X:S\rightarrow \mathbb{R}[/itex]. This function can be discrete or continuous or it can be anything. We only want the function to be measurable, that is: [itex]X^{-1}(B)\in E[/itex] for every Borel set B. Measurability is a technical concept that is almost always satisfied in practise.

Because of measurability, we can define a distribution by

[tex]P_X(B)=P(X^{-1}(B))[/tex]

Now, the mean is defined in general as

[tex]\int{XdP}=\int{xdP_X}[/tex]

So we are actually integrating with respect to a probability measure! If the measure is discrete, then the integral is a sum, if the measure is "continuous", then the integral is the integral like we all know it! :smile:
 
  • #3
Phew, that's a relief! Thanks for all the help and encouragement, micromass.

By the way, what exactly is the formal definition of discrete? Of a random variable to R, does it perhaps mean one whose range contains a (finite or infinitely) countable set of nonzero values, or is it defined in terms of continuity? Is there a definition of equal generality to the definition of a random variable itself, i.e. a definition which works for any random variable, no matter what its domain and codomain are? And how is discrete defined for a measure? Does a discrete random variable necessarily induce a discrete distribution, and is a discrete distribution necessarily induced by a discrete random variable?

I guess those scare quotes around "continuous" are because there isn't, in general, a topology defined on either of the sigma algebras, E or F. Is there a more technically correct way of expressing this difference between models that involve things like binomial distributions, and models that involve things like normal distributions?
 
  • #4
Rasalhague said:
Phew, that's a relief! Thanks for all the help and encouragement, micromass.

By the way, what exactly is the formal definition of discrete? Of a random variable to R, does it perhaps mean one whose range contains a (finite or infinitely) countable set of nonzero values, or is it defined in terms of continuity? Is there a definition of equal generality to the definition of a random variable itself, i.e. a definition which works for any random variable, no matter what its domain and codomain are?

A random variable is discrete if it's range is finite or countable. It's as easy as that.

And how is discrete defined for a measure?

It isn't. I should not have written that. I should have said "if X is discrete, then it is a sum. If X is "continuous", then it is an integral"

Does a discrete random variable necessarily induce a discrete distribution, and is a discrete distribution necessarily induced by a discrete random variable?

Yes. And I dare to say that it is by definition. But this depends of the definition.

I guess those scare quotes around "continuous" are because there isn't, in general, a topology defined on either of the sigma algebras, E or F. Is there a more technically correct way of expressing this difference between models that involve things like binomial distributions, and models that involve things like normal distributions?

There are different answers to this. One convention says that X is continuous if the cdf

[tex]F(x)=P\{X\leq x\}[/tex]

is continuous.

A (stronger) convention says that X is continuous if the distribution PX is an "absolute continuous measure". All that means is that there exists an integrable function f (called the probability density function or pdf) such that

[tex]P_X(A)=\int_A{f(x)dx}[/tex]

where the integral above is the ordinary integral that you're used to (in more technical terms: the integral with respect to Lebesgue measure).
 
  • #5
micromass said:
One convention says [...]

When S and T are topological spaces, do these notions of "continuity" coincide with the usual one? Is the need for a convention only because these are extensions of the usage of the word to cases where S and T are not both topological spaces, or is there ever a case where X can be continuous in the usual sense but not in the probability sense, or vice-versa?
 
  • #6
Rasalhague said:
When S and T are topological spaces, do these notions of "continuity" coincide with the usual one? Is the need for a convention only because these are extensions of the usage of the word to cases where S and T are not both topological spaces, or is there ever a case where X can be continuous in the usual sense but not in the probability sense, or vice-versa?

No, saying that a random variable is continuous has nothing to do with the topology on S or T. I can find a lot of topologies on S, but that doesn't mean that X is necessarily continuous in the usual sense. The terminology of saying that X is continuous is historically invented to mean that the cdf is continuous, and that's still the meaning of it today.
 

1. What is the binomial distribution?

The binomial distribution is a probability distribution that describes the number of successes in a fixed number of independent trials, where each trial has only two possible outcomes (usually labeled as success or failure) and the probability of success remains constant throughout all trials.

2. What are the assumptions of the binomial distribution?

The assumptions of the binomial distribution include a fixed number of independent trials, only two possible outcomes for each trial, a constant probability of success, and independence between each trial.

3. How is the binomial distribution calculated?

The binomial distribution is calculated using the formula P(x) = (nCx)(px)(1-p)n-x, where n is the number of trials, x is the number of successes, and p is the probability of success in each trial.

4. What is the mean and variance of the binomial distribution?

The mean of the binomial distribution is equal to np, where n is the number of trials and p is the probability of success. The variance is equal to np(1-p).

5. What is the significance of the binomial distribution in statistics?

The binomial distribution is significant in statistics because it allows us to calculate the probability of obtaining a certain number of successes in a fixed number of trials, which is useful in various real-life scenarios such as in hypothesis testing, quality control, and decision making.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
0
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
Replies
0
Views
347
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
958
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
10
Views
8K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
1K
Back
Top