Probability function for discrete functions

In summary, the textbook states that for a discrete stochastic function, knowing the probability function is sufficient to determine the distribution function. This can be shown by noting that for a countable subset of positive probability values, the probability of a set A is equal to the sum of probabilities of all points in A with positive probability. This implies that the probability of the remaining points in A, which have zero probability, is also zero.
  • #1
member 587159
My textbook says that if ##X: \Omega \to \mathbb{R}## is discrete stochast (I.e., there are only countably many values that get reached), then it suffices to know the probability function ##p(x) = \mathbb{P}\{X =x\}## in order to know the distribution function ##\mathbb{P}_X: \mathcal{R} \to \mathbb{R}: A \mapsto \mathbb{P}\{X \in A\} = \mathbb{P}(X^{-1}A)##Indeed, if ##S:= \{x : p(x) > 0\}##, then for ##A \in \mathcal{R}##, it follows that $$\mathbb{P}\{X \in A\} = \sum_{x \in S \cap A}p(x)$$

But how do they get this formula?

I tried the following:

$$\mathbb{P}\{X \in A\} = \mathbb{P}\left(X^{-1}\left(\bigcup_{a \in S\cap A}\{a\} \cup\bigcup_{a \in A\setminus S}\{a\}\right)\right) $$

$$= \mathbb{P}\left(X^{-1}\left(\bigcup_{a \in S \cap A}\{a\}\right)\right) + \mathbb{P}\left(X^{-1}\left(\bigcup_{a \in A\setminus S}\{a\}\right)\right) $$
$$=\sum_{a \in S \cap A}p(a) + \mathbb{P}\left(X^{-1}\left(\bigcup_{a \in A\setminus S}\{a\}\right)\right)$$

But how do I show that the probability on the right is zero? I can't use ## \sigma##-additivity on uncountable disjoint unions.

EDIT: ##\mathcal{R}## is the smallest sigma algebra that contains the usual topology on the real numbers, i.e. it are the Borel parts of the reals.
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
Math_QED said:
$$=\sum_{a \in S \cap A}p(a) + \mathbb{P}\left(X^{-1}\left(\bigcup_{a \in A\setminus S}\{a\}\right)\right)$$

But how do I show that the probability on the right is zero? I can't use ## \sigma##-additivity on uncountable disjoint unions.

I'll take a shot in the dark here-- I don't totally follow your notation or even what a "distribution function" is (generally means CDF but...). The problem should be easy if you can find a way to focus your attention on countable sets.

In any case, your problem reminded me of a quote from Kolmogorov that I like: "Behind every theorem lies an inequality."

- - - -

You have some special structure in that probabilities are always real non-negative and sum to one. The axioms immediately give that

##\Pr\{B\} + \Pr\{B^c\} = 1##

supposing you are allowed to use that, perhaps you can try re-running your argument over something complementary?

The goal would be to combine the two via linearity, show that your probabilities sum to one, and you end up with

##0\leq \mathbb{P}\left(X^{-1}\left(\bigcup_{a \in A\setminus S}\{a\}\right)\right) \leq 0##

(or equivalently: take advantage of positive definiteness after subtracting 1 from each side of your equation -- but the idea is to show that the thing in question is bounded above and below by zero, and hence is zero.)

- - - -
Hopefully this helps or at least provides some inspiration toward a way to get the result you want via an inequality.
 
Last edited:
  • Like
Likes member 587159
  • #3
StoneTemplePython said:
I'll take a shot in the dark here-- I don't totally follow your notation or even what a "distribution function" is (generally means CDF but...). The problem should be easy if you can find a way to focus your attention on countable sets.

In any case, your problem reminded me of a quote from Kolmogorov that I like: "Behind every theorem lies an inequality."

- - - -

You have some special structure in that probabilities are always real non-negative and sum to one. The axioms immediately give that

##\Pr\{B\} + \Pr\{B^c\} = 1##

supposing you are allowed to use that, perhaps you can try re-running your argument over something complementary?

The goal would be to combine the two via linearity, show that your probabilities sum to one, and you end up with

##0\leq \mathbb{P}\left(X^{-1}\left(\bigcup_{a \in A\setminus S}\{a\}\right)\right) \leq 0##

(or equivalently: take advantage of positive definiteness after subtracting 1 from each side of your equation -- but the idea is to show that the thing in question is bounded above and below by zero, and hence is zero.)

- - - -
Hopefully this helps or at least provides some inspiration toward a way to get the result you want via an inequality.

Thanks. I defined probability distribution at the beginning of the post. I will take some time to digest your answer.
 
  • #4
The statement looks obvious. Since the positive probability elements are countable (elements of S), to get P(A), add up all the probabilities of those elements in A which have positive probabilities.
 
  • #5
mathman said:
The statement looks obvious. Since the positive probability elements are countable (elements of S), to get P(A), add up all the probabilities of those elements in A which have positive probabilities.

I don't quite understand what you mean. Can you elaborate?
 
  • #6
Math_QED said:
I don't quite understand what you mean. Can you elaborate?
I am not sure what I need to say. S is a subset (countable) of R containing all the points with positive probability. A is a subset of R. The probability of A is the sum of the probabilities of all points of A which have positive probability. Since S consists of all points of probability, the intersection of S and A contains all the points of A needed to define the probability of A.
 
  • Like
Likes member 587159
  • #7
mathman said:
I am not sure what I need to say. S is a subset (countable) of R containing all the points with positive probability. A is a subset of R. The probability of A is the sum of the probabilities of all points of A which have positive probability. Since S consists of all points of probability, the intersection of S and A contains all the points of A needed to define the probability of A.

I think I understand what you mean. Let me write it out:

$$P(X \in A) = P(X\in A\cap X(\Omega))$$
$$= \sum_{x \in A \cap X(\Omega)} P(X=x)$$
$$=\sum_{x\in A \cap S} P(X=x) + \sum_{x \in A \cap X(\Omega) \setminus S} P(X=x)$$

And the last sum is 0. Does this make sense? In the second equality I used that the image of X is at most countable and for the last equality that ##S \subseteq X(\Omega)##
 
  • #8
Your notation throws me. That's why I prefer words. [tex]What\ is\ X(\Omega)?[/tex] X does not have to be countable - only the subset of X consisting of points of positive probability.
 
  • #9
mathman said:
Your notation throws me. That's why I prefer words. [tex]What\ is\ X(\Omega)?[/tex] X does not have to be countable - only the subset of X consisting of points of positive probability.

##X(\Omega) = Im(X)##, the image of the function ##X##. I.e. all the values that X attain. And clearly this is countable, because by assumption this is countable (the variable is discrete)
 
  • #10
Math_QED said:
I think I understand what you mean. Let me write it out:

$$P(X \in A) = P(X\in A\cap X(\Omega))$$
$$= \sum_{x \in A \cap X(\Omega)} P(X=x)$$
$$=\sum_{x\in A \cap S} P(X=x) + \sum_{x \in A \cap X(\Omega) \setminus S} P(X=x)$$

And the last sum is 0. Does this make sense? In the second equality I used that the image of X is at most countable and for the last equality that ##S \subseteq X(\Omega)##
In your original definition of X, it appears that X is countable, so there are only a countable number (if any) points in the second sum where each point has P(X=x)=0.
 
  • #11
mathman said:
In your original definition of X, it appears that X is countable, so there are only a countable number (if any) points in the second sum where each point has P(X=x)=0.

Every term in the last sum is zero, because the sum runs over elements not in S (so by definitions points with probability 0)
 
  • #12
Math_QED said:
Every term in the last sum is zero, because the sum runs over elements not in S (so by definitions points with probability 0)
The point I was making is that if the number of points in X were not countable, the last term would be problematical. Summations have meaning only if the number of terms is countable.
 

What is a probability function for discrete functions?

A probability function for discrete functions is a mathematical tool used to assign probabilities to different outcomes of a discrete random variable. It maps each possible outcome to a numerical value between 0 and 1, representing the likelihood of that outcome occurring.

How is a probability function different from a probability distribution?

A probability function is a mathematical representation of the probabilities for each possible outcome of a discrete random variable, while a probability distribution is a visual representation of those probabilities. The probability distribution is typically graphed as a histogram or bar chart, while the probability function is expressed as a mathematical equation.

What is the difference between a discrete and continuous random variable?

A discrete random variable is one that can only take on a finite or countably infinite number of values, while a continuous random variable can take on any value within a certain range. For example, the number of children in a family is a discrete random variable, while the height of a person is a continuous random variable.

How do you calculate the expected value of a discrete random variable using a probability function?

The expected value of a discrete random variable is calculated by multiplying each possible outcome by its corresponding probability and then summing all of these values. In other words, it is the weighted average of all possible outcomes based on their probabilities. This can be expressed mathematically as: E(X) = ΣxP(X=x), where X is the discrete random variable and P(X=x) is the probability function for that variable.

Can a probability function be used to predict the exact outcome of a single trial?

No, a probability function cannot be used to predict the exact outcome of a single trial. It only provides the likelihood of each possible outcome occurring, not the actual outcome of a single trial. However, over multiple trials, the probability function can be used to make predictions about the overall frequency of each outcome.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
960
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
871
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
928
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
886
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
687
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
Back
Top