# Confused about formal definitions of probability theory

I think the first thing that is confusing me is the terminology. There are too many similar terms (e.g. probability measure, probability distribution, probability density function, probability mass function)

What are the general concepts and what are the instances of those concepts? Like, are probability density functions and probability mass functions instances of probability distributions?

Also, where does the study of probability begin? It seems to me that it begins with the notion of a sample space which I believe to be just a set that I, as a human being, associate to each element a possible outcome of an experiment. It basically just has to have the "right" cardinality since math doesn't really know about HEADS/TAILS or the existence of a dice.

Then an event is any subset of a sample space and a probability measure maps events to the interval [0,1] of the real numbers. But then what are probability distributions?

Random variables are functions that map from a sample space to a subset of the real numbers. They can be continuous or discrete. But the definition of continuity of functions requires the domain and image of a function to have topologies on them. Is that the case for a continuous random variable? What are the topologies on the sample space and the subset of the real numbers?

Related Set Theory, Logic, Probability, Statistics News on Phys.org
The fundamental concepts are those of sample space, events and probability measure. Those are at the beginning of probability theory.

Sadly enough, the sample space and the associated structure (events and measure) is too large to be interesting. For example, let's say we throw a coin 10 times and record the sequence we get. An element of the sample space is then H,H,H,H,T,T,T,H,H,T. And this element has a specific probability. However, we are usually not interested in the sample space, but rather in some other kind of information. For example, we might be interested in the number of tails. The previous element would then get us 4 tails. So we are really just interested in the number 4 and not the actual event.

This is where random variables come in. Random variables represent some kind of "compression of information". It takes the huge information in the sample space and reduces it to something we are truly interested in. The sample space is fundamental in theory, but is often hard to write down in practice. That's why we usually just accept that some sample space exists and just work with the random variables. We rarely want to describe the sample space completely.

So the essential objects are the random variables. These are (measurable) functions from the sample space to ##\mathbb{R}## (other definitions are possible, for example, the codomain can be any set). The random variables have some associated information:

One of this is an probability measure. So let's say that ##(\Omega,\mathcal{A},\mathbb{P})## is our original sample space. And ##X:\Omega\rightarrow \mathbb{R}## is our random variable. On ##\mathbb{R}## we can then put a probability structure. We choose our events to be the Borel sets and we choose the probability function on ##\mathbb{R}## to be
$$\mathbb{P}_X(A) = \mathbb{P}(X^{-1}(A))$$
This probability measure ##\mathbb{P}_X## is the probability distribution associated to ##X##. For example in our coin example, the probability distribution can be used to answer things like "what is the probability of getting 5 heads". This is just equal to ##\mathbb{P}_X(\{5\})##.

Another thing associated to our random variable is the cumulative distribution function or cdf. This is a function ##F_X:\mathbb{R}\rightarrow \mathbb{R}## and is defined by ##F_X(t) = \mathbb{P}_X((-\infty,t])##.

If we want to talk about continuous distributions, then we usually refer to ##\mathbb{P}_X##. So we say that ##X## is discrete if there is a countable set ##A## such that ##\mathbb{P}_X(\mathbb{R}\setminus A) = 0##. In this case, we define the function ##g:\mathbb{R}\rightarrow \mathbb{R}## by ##g(x) = \mathbb{P}_X(\{x\})##. This is called the probability mass function associated to the random variable. We can use it to describe the entire ##\mathbb{P}_X## by
$$\mathbb{P}_X(A) = \sum_{x\in A} g(x)$$
For a distribution to be continuous, we require that there is a positive function ##f:\mathbb{R}\rightarrow \mathbb{R}## such that
$$\mathbb{P}_X(A) = \int_A f(x)dx$$
At the first sight, this has nothing to do with continuity in the usual sense, but there are connections. The connection is with the cdf ##F_X## which is continuous if ##X## is continuous. In fact, ##F_X## is even more: it is absolutely continuous.

Note that distributions don't need to be continuous or discrete. There are other possibilities which do arise very naturally.

So to summarize, the fundamental concepts are the sample space, events and probability measure.
Then if we are given a random variable, then we can associate with this random variable a lot of interesting things such as a probability distribution, a cdf, a pdf, a pmf, etc.

The fundamental concepts are those of sample space, events and probability measure. Those are at the beginning of probability theory.
Okay, so if I completely abstract these concepts, the beginning of probability theory is about a set ##\Omega## (the sample space), a σ-algebra ##\Sigma## on ##\Omega## (aka the collection of events) and a measure ##\mu## that maps the σ-algebra to the interval ##[0,1]\in\mathbb{R}##. I guess you could abstract this even further by letting the probability measure be an arbitrary measure, but this is fine.

So the essential objects are the random variables. These are (measurable) functions from the sample space to ##\mathbb{R}## (other definitions are possible, for example, the codomain can be any set).
So a random variable ##X## is a measurable function from the measurable space ##(\Omega,\Sigma)## to ##(\mathbb{R},\mathcal{B})## where ##\mathcal{B}## is the Borel σ-algebra on ##\mathbb{R}##.

Or like you said, a random variable could be any ##X:(\Omega,\Sigma)\mapsto\mathcal{M}## where ##\mathcal{M}## is a measurable space. Right?

Okay, so I could have defined a probability measure on the σ-algebra ##\Sigma## without the notion of a random variable. But now, since all that is required for a probability measure is that it be a measure from some σ-algebra to the interval [0,1], I can define a probability measure on ##\mathcal{B}##, the Borel σ-algebra on ##\mathbb{R}##. And all concepts (such as: probability distributions, CDFs, PDFs and PMFs) are instances of this probability measure on ##\mathcal{B}##. Is that right?

Obs: I am still going trough your post and I may have other questions. But if my thinking is wrong in this post they may not even make sense. So I am going to wait :)

Thanks!

Okay, so if I completely abstract these concepts, the beginning of probability theory is about a set ##\Omega## (the sample space), a σ-algebra ##\Sigma## on ##\Omega## (aka the collection of events) and a measure ##\mu## that maps the σ-algebra to the interval ##[0,1]\in\mathbb{R}##.
Yes.

I guess you could abstract this even further by letting the probability measure be an arbitrary measure, but this is fine.
Sure, but then you're doing measure theory instead of probability.

So a random variable ##X## is a measurable function from the measurable space ##(\Omega,\Sigma)## to ##(\mathbb{R},\mathcal{B})## where ##\mathcal{B}## is the Borel σ-algebra on ##\mathbb{R}##.

Or like you said, a random variable could be any ##X:(\Omega,\Sigma)\mapsto\mathcal{M}## where ##\mathcal{M}## is a measurable space. Right?
Correct.

Okay, so I could have defined a probability measure on the σ-algebra ##\Sigma## without the notion of a random variable. But now, since all that is required for a probability measure is that it be a measure from some σ-algebra to the interval [0,1], I can define a probability measure on ##\mathcal{B}##, the Borel σ-algebra on ##\mathbb{R}##. And all concepts (such as: probability distributions, CDFs, PDFs and PMFs) are instances of this probability measure on ##\mathcal{B}##. Is that right?
Right. The essential point is the eventual measure on ##\mathcal{B}##. That allows us to develop most of the concepts in probability.

All right, thanks! I guess my "first next question" is what are the topologies defined on ##\Omega## and ##\mathbb{R}## that allow us to talk about continuity of random variables? I know that by talking about Borel sets of ##\mathbb{R}## we are assuming a topology on ##\mathbb{R}##, mainly because Borel sets are defined to be the countable unions and intersections as well as relative complements of open sets.

If the definition that you used of continuity is equivalent to the "usual" definition of continuity then it should induce some topologies on those sets. Do you know what they are?

Also, say I have a probability space ##(\Omega,\Sigma,\mu)## where ##\Omega## is a sample space, ##\Sigma## is the collection of events and ##\mu## the probability measure. Then I define a random variable ##X## that essentially maps from the measurable space ##(\Omega,\Sigma)## to some measurable space ##(\mathbb{R},\mathcal{B})##. I guess it's a natural step then to define a probability measure ##\mu_X## on ##\mathcal{B}## such that
$$\mu_X(B)=\mu(X^{-1}(B))$$
which is equivalent to saying that ##\mu_X## takes ##B\in\mathcal{B}## to the same place that ##\mu## took whatever ##X## maps to ##B##.

What is this probability measure called? I guess it is the same as the probability measure you defined as ##\mathbb{P}_X(A)=\mathbb{P}(X^{-1}(A))##.

All right, thanks! I guess my "first next question" is what are the topologies defined on ##\Omega## and ##\mathbb{R}## that allow us to talk about continuity of random variables? I know that by talking about Borel sets of ##\mathbb{R}## we are assuming a topology on ##\mathbb{R}##, mainly because Borel sets are defined to be the countable unions and intersections as well as relative complements of open sets.

If the definition that you used of continuity is equivalent to the "usual" definition of continuity then it should induce some topologies on those sets. Do you know what they are?
There is no topology on ##\Omega##. Saying that ##X## is a continuous random variable is not saying that ##X:\Omega\rightarrow \mathbb{R}## is continuous with respect to some topologies. It has a very different definitioon that does not coincide with the topology definition.

Also, say I have a probability space ##(\Omega,\Sigma,\mu)## where ##\Omega## is a sample space, ##\Sigma## is the collection of events and ##\mu## the probability measure. Then I define a random variable ##X## that essentially maps from the measurable space ##(\Omega,\Sigma)## to some measurable space ##(\mathbb{R},\mathcal{B})##. I guess it's a natural step then to define a probability measure ##\mu_X## on ##\mathcal{B}## such that
$$\mu_X(B)=\mu(X^{-1}(B))$$
which is equivalent to saying that ##\mu_X## takes ##B\in\mathcal{B}## to the same place that ##\mu## took whatever ##X## maps to ##B##.

What is this probability measure called? I guess it is the same as the probability measure you defined as ##\mathbb{P}_X(A)=\mathbb{P}(X^{-1}(A))##.
I would call it the probability distribution of ##X##.

There is no topology on ##\Omega##. Saying that ##X## is a continuous random variable is not saying that ##X:\Omega\rightarrow \mathbb{R}## is continuous with respect to some topologies. It has a very different definitioon that does not coincide with the topology definition.
Then what is the formal definition of continuity for random variables?
I would call it the probability distribution of ##X##.
I'm confused about terminology again.. A probability measure is a map from the σ-algebra on the sample space in question to the real numbers. What is a probability distribution exactly? Is it a type of probability measure??

Then what is the formal definition of continuity for random variables?
One possible definition is demanding that the cdf ##F_X## is continuous. This is equivalent with saying that ##\mathbb{P}_X(\{a\}) = 0## for each ##a\in \mathbb{R}##.

Another definition which I think is more popular is that we say that ##X## is continuous if there exists a positive function ##f:\mathbb{R}\rightarrow \mathbb{R}## such that
$$\mathbb{P}_X(A) = \int_A f(x)dx$$
for each Borel set ##A##. This is sometimes called absolutely continuous.

So these are two (not equivalent) definitions of continuous random variables. You should check the different texts to see what they use. But in practice, they usually take the second definition. There are random vairables that satisfy the first condition but not the second but they are very pathological, they shouldn't arise at all in practice. So this is why most people take the second definition.

I'm confused about terminology again.. A probability measure is a map from the σ-algebra on the sample space in question to the real numbers. What is a probability distribution exactly? Is it a type of probability measure??
A probability distribution is synonymous with probability measure.

There are random vairables that satisfy the first condition but not the second but they are very pathological, they shouldn't arise at all in practice.
A tidbit that may not be very useful for V0ODO0CH1LD's questions...

There are random variables with continuous CDF that aren't absolutely continuous which arise naturally in econ, for instance in the theory of "discounted repeated games". A lot of the time, players' lifetime payoffs will have such a distribution.

A tidbit that may not be very useful for V0ODO0CH1LD's questions...

There are random variables with continuous CDF that aren't absolutely continuous which arise naturally in econ, for instance in the theory of "discounted repeated games". A lot of the time, players' lifetime payoffs will have such a distribution.
Interesting. Do you have a reference for such distributions and how economics deals with them?

What I had in mind has the following flavor: A person faces a sequence of outcomes ##(u_t)_{t=0}^\infty##each from ##\{0,1\}##, she has a discount factor of ##\beta\in(0,1)## and we compute her lifetime payoff as ##U=(1-\beta)\sum_{t=0}^\infty \beta^t u_t \in[0,1]##. Then the distribution of ##U## can be as described for a lot of distributions on ##(u_t)_{t=0}^\infty##. For instance, if ##\beta = \frac13## and ##(u_t)_{t=0}^\infty## are independent uniform, than ##U## has the usual Cantor distribution.

In practice, economists still don't need to think too hard about the distribution. Typically, all the information we need is very finitary, e.g. things like the expectation of ##U_T=(1-\beta)\sum_{t=0}^\infty \beta^t u_{T+t} \in[0,1]## conditional on ##(u_0,...,u_{T-1})##.

I am still confused with the concepts.. I'm gonna go in steps, so it's easier to correct me at any given point if I'm wrong.

A probability measure maps from a σ-algebra on some sample space to the real numbers.

If the sample space happens to be the real numbers (or some subset of the real numbers) that σ-algebra will usually be the borel σ-algebra on the reals.

I don't know what the function that generates the bell curve is called (I hope it's not the probability distribution), but whatever it is it can't be a probability measure because it maps from the reals to the reals and a probability measure has to map from the borel σ-algebra to the reals.

A probability density function IS a probability measure, because it maps any ##A\subset\mathbb{R}## in the borel σ-algebra to
$$\mathbb{P}(A)=\int_Af(x)dx$$
where ##f(x)## is that bell curve that I don't know the name of.. And I hope it isn't the probability density because you said it is synonymous with probability measures, which ##f(x)## clearly isn't.

FactChecker
Gold Member
I am still confused with the concepts.. I'm gonna go in steps, so it's easier to correct me at any given point if I'm wrong.

A probability measure maps from a σ-algebra on some sample space to the real numbers.

If the sample space happens to be the real numbers (or some subset of the real numbers) that σ-algebra will usually be the borel σ-algebra on the reals.

I don't know what the function that generates the bell curve is called (I hope it's not the probability distribution), but whatever it is it can't be a probability measure because it maps from the reals to the reals and a probability measure has to map from the borel σ-algebra to the reals.

A probability density function IS a probability measure,
No it is not. It's values can exceed 1.0 so it can not be a probability measure.
You need to consider all the properties, not just the topology of the domain.
The distribution function values are probabilities and must be between 0 and 1. Density functions are not probabilities. They can have values much greater than 1.0. The bell curve is a density function. The distribution function is the integral of the density function.

I am still confused with the concepts.. I'm gonna go in steps, so it's easier to correct me at any given point if I'm wrong.

A probability measure maps from a σ-algebra on some sample space to the real numbers.

If the sample space happens to be the real numbers (or some subset of the real numbers) that σ-algebra will usually be the borel σ-algebra on the reals.
Both correct.

I don't know what the function that generates the bell curve is called (I hope it's not the probability distribution), but whatever it is it can't be a probability measure because it maps from the reals to the reals and a probability measure has to map from the borel σ-algebra to the reals.

A probability density function IS a probability measure, because it maps any ##A\subset\mathbb{R}## in the borel σ-algebra to
$$\mathbb{P}(A)=\int_Af(x)dx$$
where ##f(x)## is that bell curve that I don't know the name of.. And I hope it isn't the probability density because you said it is synonymous with probability measures, which ##f(x)## clearly isn't.
I said that a probability distribution is synonymous with probability measure, the probability density function (pdf) is not synonymous with probability measure. I know there are many similar sounding names that are confusing at first.

A pdf is the name of the positive function ##f:\mathbb{R}\rightarrow \mathbb{R}##. Given a pdf, we can find a probability measure by putting
$$\mathbb{P}(A) = \int_A f(x)dx$$
So given any pdf, we can associate with this a probability measure, the converse is not true: some probability measures do not have an associated pdf.

The bell shaped curve that you are referring to is the pdf of the normal distribution. It is defined as
$$f(x) = \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$
for some fixed numbers ##\mu\in \mathbb{R}## and ##\sigma>0##. This function generated a probability measure by
$$\mathbb{P}(A) = \frac{1}{\sqrt{2\pi \sigma^2}} \int_A e^{-\frac{(x-\mu)^2}{2\sigma^2}}dx$$

Okay, I think I got it! Let me work through an example to see if I really did.

If I'm given a sample space (##\mathbb{R}##) with mean (##\mu##) of 1.2 and a standard deviation (##\sigma##) of 0.05 I can then calculate something called the probability density function, which looks like this:
$$f(x)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} = \frac{1}{0.05\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x-1.2}{0.05}\right)^2}.$$
Which is technically a map from the sample space to the reals, but since our sample space IS the reals the probability density function is a map ##f:\mathbb{R}\rightarrow\mathbb{R}##.

From this probability density function I can calculate the probability measure (aka probability distribution) on the borel σ-algebra (##\mathcal{A}##) on ##\mathbb{R}##, which in turn looks like this:
$$\mathbb{P}(A)=\int_Af(x)dx=\int_A\frac{1}{0.05\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x-1.2}{0.05}\right)^2}dx.$$
This is a map ##\mathbb{P}:\mathcal{A}\rightarrow\mathbb{R}##. So now I can calculate the probability that if I chose an element from the sample space at random it would lie in the borel set ##A##?

For instance, what does the integral
$$\mathbb{P}([1.05,\infty))=\int_{1.05}^\infty\frac{1}{0.05\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x-1.2}{0.05}\right)^2}dx.$$
mean?

Okay, I think I got it! Let me work through an example to see if I really did.

If I'm given a sample space (##\mathbb{R}##) with mean (##\mu##) of 1.2 and a standard deviation (##\sigma##) of 0.05 I can then calculate something called the probability density function, which looks like this:
$$f(x)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} = \frac{1}{0.05\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x-1.2}{0.05}\right)^2}.$$
Which is technically a map from the sample space to the reals,
The pdf is always a map ##\mathbb{R}\rightarrow \mathbb{R}##.

but since our sample space IS the reals the probability density function is a map ##f:\mathbb{R}\rightarrow\mathbb{R}##.

From this probability density function I can calculate the probability measure (aka probability distribution) on the borel σ-algebra (##\mathcal{A}##) on ##\mathbb{R}##, which in turn looks like this:
$$\mathbb{P}(A)=\int_Af(x)dx=\int_A\frac{1}{0.05\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x-1.2}{0.05}\right)^2}dx.$$
This is a map ##\mathbb{P}:\mathcal{A}\rightarrow\mathbb{R}##. So now I can calculate the probability that if I chose an element from the sample space at random it would lie in the borel set ##A##?

For instance, what does the integral
$$\mathbb{P}([1.05,\infty))=\int_{1.05}^\infty\frac{1}{0.05\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x-1.2}{0.05}\right)^2}dx.$$
mean?
Your interpretation is correct. It means that if you choose an element at random according to this distribution, then ##\mathbb{P}([1.05,+\infty))## is the probability that the element is larger than ##1.05##.

Thanks!