# Rigorous mathematical definition of sampling

• justinmir

#### justinmir

Please could you help me find a rigorous mathematical definition of sampling as it is used in mathematical statistics?

Let ##X:\Omega\rightarrow\mathbb{R}## be a random variable and ##X_{1},...,X_{n}## is a statistical sample. What is its mathematical meaning in probability theory? Are we talking about elements of the sample space ##\omega_{1},...,\omega_{n}## from ##\varOmega^{n}## ?

We always say that samples are equally probable and independent. But in order for ##\omega_{i}## to be equally probable we need to have on ##\varOmega## a uniform discrete distribution that assigned equal probability to every element ##\omega## . But this means that ##\varOmega## must be finite and the probability measure defined on it must be uniform. But what if the original measure on ##\varOmega## was not uniform and it assigned different probabilities to different elements? Then we cannot have samples that are equally probable.

Let me reiterate - a discrete uniform distribution cannot be defined on an infinite set, therefore if we sample such a set we will not be able to say that all samples we might draw are equally probable. This is why for instance we cannot have a notion of a completely random real or even integer number. On a set of reals we cannot define a distribution that assigns equal probability to individual points. If we were to draw samples which would not be equally probable and mostly independent, then what would be the point? All of the basic formulas of statistics would not apply. Take for instance the definition of a sample mean or variance and see if it works on samples which don't have equal probability or samples of zero probability. What would the mathematical meaning of this sampling be?

Does it mean that the scope of definition of sampling is restricted to finite ##\varOmega## with uniform distribution? Does it mean that sampling does not make sense in other scenarios?

I don't suppose this is the case, sampling must mean something in those cases.

Please could you help me find a rigorous definition and recommend me some literature?

Last edited:

But in order for ##\omega_{i}## to be equally probable we need to have on ##\varOmega## a uniform discrete distribution that assigned equal probability to every element ##\omega## .
This is wrong. You can have a sample from any distribution, including continuous ones.
The ##w_i## values obtained do not have equal probabilities, the ##w_i## PDFs are identical. That is not the same thing.

Last edited:
This is wrong. You can have a sample from any distribution, including continuous ones.
The ##w_i## values obtained do not have equal probabilities, the ##w_i## PDFs are identical. That is not the same thing.

Thank you. Please note that ##\omega_i## are elements of set ##\varOmega## where we have a field of subsets and a probability measure. They are not random variables with values in reals therefore ##\omega_i## don't have PDFs.

My point is that if every ##\omega_i## does not have same probability of being chosen then using ##X_i## we won't be able to estimate expectation and variance of ##X## using standard definitions, we would need to apply weights, which would equalise their probabilities. Think about sampling from a population of people where the probabilities of selecting each person is not the same. It will be necessary to apply weights in order to make such sampling useful for estimating distribution of ##X_i## . And this argument applies only to a finite sample space. But if it is infinite then we cannot level up our ##\omega_i## . I think you might have meant that ##X_i## have same probability distributions as ##X##. However mathematically sampling seems to be about selecting elements from sample space rather than values from the range of function ##X## and that seems to require either equal probability of elements or a possibility of adjusting them to be equal using weights. But this suggests that the sample space must be finite and the probability measure on it must be discrete. But this is not what we feel about this matter. Before I started thinking about it I also assumed that useful sampling that allows us to estimate distribution of a random variable can be done in an infinite sample space scenario. I think if such sampling exists there must be a somewhat subtle definition of it, which is what I am looking for.

A situation where everything works as far as I can see is as follows - if ##X:\varOmega\rightarrow\mathbb{R}## is a random variable defined on finite ##\varOmega## then sampling means constructing ##\varOmega^{n}## with probability measure induced by measure on ##\varOmega## and drawing ##\overline{\omega}=(\omega_{1},...,\omega_{n})## from it. In order for the samples to be equally probable, which is as far as I know a necessary requirement for sampling, either the initial probability on ##\varOmega## must be uniform or we need to use weights to make it so. But I would not expect that this is the full story. I think I am missing something.

Last edited:
Here is where you went wrong:
We always say that samples are equally probable and independent.
We don't say that at all: we say that samples are independent and identically distributed.

• FactChecker
I think you might have meant that ##X_i## have same probability distributions as ##X##.
You are right. I stand corrected.
If you are arguing that it is not possible to draw a sample from a general continuous probability distribution, then your argument is doomed to failure. It is done all the time. Your theory should take that into account.

• justinmir
Here is where you went wrong:

We don't say that at all: we say that samples are independent and identically distributed.
Thank you. I meant elements ##\omega_i## of sample space ##\varOmega## rather than sampled values of ##X##.

When we try to understand what sample is mathematically we have two options -

1. samples are ##n## i.i.d. random variables ##X_i## defined on same sample space ##\varOmega## evaluated on same element of the sample space ##X_i(\omega)##

2. samples are n elements ##\omega_i## of sample space ##\varOmega## on which we evaluate random variable ##X## getting ##n## sampled values ##X(\omega_i)##

I am talking about interpretation 2 because it is mathematically tighter, it does not introduce ##n## random variables about which we don't know anything. There is a lot of evidence in favour of this interpretation, which I can go into if you are interested.

Sampled values in 2 are identically distributed because in this interpretation they are just ##X## evaluated on samples ##\omega_i## taken from ##\varOmega##

Both options bring us to an unexpected preliminary conclusion that in order for sampling to be useful, in order for it to allow us to estimate the distribution of ##X## the sample space in our model ##\varOmega## needs to be finite or so it seems.

From practical point of view it's not a problem because everything we measure is discretised, every population we study is finite, etc. But I am asking about a theoretical definition, that's why we are talking about ##\varOmega## and other abstract concepts. It's a bit like looking at Shannon's definition of information in order to get to the bottom of this rather than just practically storing and retrieving data, that is not concerned with information-theoretic entropy and so on.

It's all very subtle.

I would appreciate if you could recommend any good publications concerned with theoretical foundations of sampling.

Last edited:
• pbuk
This subject is a very large and active research area. Your questions here are just the tip of the iceberg. Political polling is the most obvious application. There are techniques of stratified sampling, importance sampling, variance reduction, sources of bias, etc. I suggest that you do some literature review before you get too deep into anyone area, or question, or reference.

Last edited:
I think you are confused, and your sloppy notation is not helping: for instance
I meant elements of sample space ##\omega_i##.
you presumably mean elements ##\omega_i## of a sample space ## \Omega ##.
My point is that if every ##\omega_i## does not have same probability of being chosen then using ##X_i##
Well every ## \omega_i ## does have an equal probability of being chosen: it has an a priori probability equal to 0 and an ex post probability equal to 1.

Samples are necessarily finite in number. This does not mean that statistical results that apply to inifinite sample spaces are invalid, nor does it prevent us from proving limiting results as ## n \to \infty ##.

As a life long user of probability theory, I've never been concerned with ##\omega_i## only with ##X_i##. Sample space is background, events are the subject of analysis.

• suremarc and pbuk
Please could you help me find a rigorous mathematical definition of sampling as it is used in mathematical statistics?
In the rigorous mathematical theory of probability ( based on measure theory) there are no definitions and assumptions that deal with taking samples - in the sense of measuring an "actual" value of a random variable from among it's "possible" values. There is no assumption that says you can (or can't) take random samples.

What probability theory deals with are probability measures. One may define "sampling distributions" but in a rigorous approach this is done without introducing the metaphysical concepts of "actual" and "possible".

One could formulate a rigorous theory of statistics based only on rigorous probability theory but this is seldom done often because statistics is an important application of mathematics. When mathematics is applied, the theoretical mathematical structure must be given an interpretation. The interpretation is specific to the practical problem being considered.

Probability theory gives no answers to questions such as
1) Is it really possible to take a sample from a uniform distribution on [0,1]?
2) Is it impossible for an event with probability 0 to happen?
3) Will an event with probability 1 always happen?

In applied math, the concepts of "possible" and "impossible" depend on the science that applies to the problem being solved, not on the axioms and theorems of measure theory.

• pbuk
In the rigorous mathematical theory of probability ( based on measure theory) there are no definitions and assumptions that deal with taking samples - in the sense of measuring an "actual" value of a random variable from among it's "possible" values. There is no assumption that says you can (or can't) take random samples.

What probability theory deals with are probability measures. One may define "sampling distributions" but in a rigorous approach this is done without introducing the metaphysical concepts of "actual" and "possible".

One could formulate a rigorous theory of statistics based only on rigorous probability theory but this is seldom done often because statistics is an important application of mathematics. When mathematics is applied, the theoretical mathematical structure must be given an interpretation. The interpretation is specific to the practical problem being considered.

Probability theory gives no answers to questions such as
1) Is it really possible to take a sample from a uniform distribution on [0,1]?
2) Is it impossible for an event with probability 0 to happen?
3) Will an event with probability 1 always happen?

In applied math, the concepts of "possible" and "impossible" depend on the science that applies to the problem being solved, not on the axioms and theorems of measure theory.
1) Yes in theory. Implementation may be a problem.
2) No - example choosing from uniform [0,1] will give a number which has zero probability.
3) No - complement of question 2).

• FactChecker
1) Yes in theory. Implementation may be a problem.
2) No - example choosing from uniform [0,1] will give a number which has zero probability.
3) No - complement of question 2).
Those answers are typical interpretations, but my point is they are not assumptions or theorems of measure theory.

If you want rigorous treatment then it is necessary to use measure theory, which I learned in graduate school. Lebesque integration and so forth. I personally found it easy to learn, just abstract. The text we used was Rudin, which I liked.

Those answers are typical interpretations, but my point is they are not assumptions or theorems of measure theory.
I find the original question rather thought provoking. What are the odds of picking a particular exact number at random from a set of real numbers? In one respect, it should be zero I think? Now I'm thinking of axiom of choice, and a little continuum theorem.