# Understanding different Monte Carlo approximation notations

• I

## Summary:

Understanding different notations of Monte Carlo approximations.
Currently working on a project involving Monte Carlo integrals. I haven't had any prior studies of this method, so hence the following question.

Consider the following expectation:

$$E[f(X)]=\int_A f(x)g(x)dx.$$

Let ##X## be a random variable taking values in ##A\subseteq\mathbb{R}^n##. Let ##g:A\to\mathbb{R}_+## be the probability density (pdf) of ##X##, and ##f:A\to\mathbb{R}## a function such that the expectation above is finite.

If ##X_1,X_2,...X_N## be independent random variables with pdf ##g##, then by the law of large numbers,

$$E[f(X)]\to \frac{1}{N} \sum_{i=1}^N f(X_i) \quad \text{as N} \to \infty.$$

As far I understand, the sum above is a general Monte Carlo approximation of the integral.

Does the above approximation make any assumption on the pdf, i.e. uniformity and normalisation? If it is a general approximation, it should hold for any pdf, but I have seen different approximations such as ##V\frac{1}{N}\sum_{i=1}^N f(X_i)## and ##\frac{1}{N}\sum_{i=1}^N \frac{f(X_i)}{g(X_i)}##, where in the former ##V## denotes the definite integral over the pdf. How are these related and derived?

Last edited:

Related Set Theory, Logic, Probability, Statistics News on Phys.org
mathman
(First question) - The only assumption is that the mean exists. By definition pdf is normalized. Uniformity (?) is irrelevant.

Different approximations - I have never seen the first one ##(V\times)##. The second looks like what is needed when ##X_i## is sampled from a uniform distribution, but it should have ##f(X_i)g(X_i)##.

The form ##\frac{f(X_i)}{g(X_i)}## would be used when sampling ##X_i## using ##g(x)## when the distribution of ##X_i## is actually uniform. Both here and the above notes, it is assumed that ##X## is distributed over a finite interval.

Last edited:
schniefen
Thanks for the reply. The formula ##V\frac{1}{N}\sum_{i=1}^N f(X_i)## appears in the Wikipedia article on Monte Carlo integration. Using the mean value theorem, I can see how this formula could easily be derived. However, when using a probabilistic approach, the integral ##I## in the article could be written as ##I=\int_{\Omega} f(x)g(x)dx## with ##g(x)=1##. Wouldn't then the law of large numbers simply yield ##\frac{1}{N} \sum_{i=1}^N f(X_i)## for this integral too? Where does the factor ##V## come from?

Stephen Tashi
Where does the factor ##V## come from?
Monte-carlo integration can be used:
1) To estimate the value of a (deterministic) function over a volume.
2) To estimate a function (e.g. the mean) of a random variable when the function is defined by an integral.

The Wikipedia article refers to integrating a (deterministic) function ##f## over a volume ##V## , not to integrating a multidimensional random variable whose domain is that volume. The estimate is done by introducing a random variable ##\bar{x}## that is uniformly distributed over the volume.

The mean of a set of samples drawn from various places in a volume is used to estimate the mean value of ##f## per unit volume. The estimated value of the integral is the volume ##V## times the estimate of the mean value of ##f## per unit volume.

How do we explain that sampling a function defined on volume using a uniform density is an estimate of the "density" of that function per unit volume? Justifying that thought is the basic conceptual difficulty.

schniefen
The Wikipedia article refers to integrating a (deterministic) function ##f## over a volume ##V## , not to integrating a multidimensional random variable whose domain is that volume.
What is the difference? If the multidimensional random variable ##f(X)## "looks like" the deterministic function ##f##, wouldn't it be the same (with the pdf of ##f(X)## being equal to 1, i.e. ##E[f(X)]=I=\int_{\Omega=A}f(x)dx##)?

How do we explain that sampling a function defined on volume using a uniform density is an estimate of the "density" of that function per unit volume?
Wouldn't it be the "density" of that function per that volume (from which we sample the function values)? Why is it unit volume?

Monte-carlo integration can be used:
1) To estimate the value of a (deterministic) function over a volume.
2) To estimate a function (e.g. the mean) of a random variable when the function is defined by an integral.
Is there a defining link between the two areas of applications? Somewhere where 1) would be equivalent to 2)? Or any other relation? Currently struggling with the presentation of Monte Carlo integration in the language of probability theory versus plain calculus language.

Stephen Tashi
Is there a defining link between the two areas of applications?
There is a general way of defining a problem that includes both major types of problems:
1) Integration of deterministic functions by introducing random variables
2) Computation of integrals involving probability distributions - i.e. computing certain functions of random variables.

In either case, a Monte-Carlo method involves random variables. In both cases, the random variables may or may not be uniformly distributed.

In both cases, the distribution of the random variable may or may not determine how samples of the random variable are selected. For example, in statistics there a sampling methods such as "importance sampling" and "stratified sampling" that do not draw samples from uniform distributions or from the distribution of the population in question.

So a general formulation is that we are estimating an integral ##\int_{\Omega} f(\bar{x}) g(\bar{x}) d\bar{x}## where ##g(\bar{x})## is a probability density (which may have the constant value 1). We select samples of ##\bar{x}## by using some distribution ##s(\bar{x})## which may or may not be equal to ##g(\bar{x})##. If necessary, in computing our estimate of the integral we use formulae that compensate for the fact that we sampled using ##s(\bar{x})## instead of ##g(\bar{x})##.

A further generalization is the "Markov Chain Monte-Carlo" type of estimation where the location of samples is determined by a Markov Chain instead of by taking independent random samples from some given distribution.

Wouldn't it be the "density" of that function per that volume (from which we sample the function values)? Why is it unit volume?
It depends on how we define the "density"of a function - which I deliberately neglected to do! If I think of the function ##f(\bar{x})## itself as the physical density of something per unit volume (e.g. kg/ m^3 ) then ##\frac{1}{N} ( f(\bar{x_1}) + f(\bar{x_2}) + ...+f(\bar{x_N}))## is an estimate of the mean density per unit volume and ##V## times that is an estimate of the total stuff (whatever ##f## is a density for) in the volume ##V##. The exact total stuff is given by ##\int_V f(\bar{x}) d\bar{x}##.

In some physical applications it's unnatural to think of ##f(\bar{x}) ## as a density. (For example, in 1-D it's unnatural to think of velocity as the density of position in time even though it has the appropriate physical units) However, the above way of looking at things makes the general method of estimation plausible.

Last edited:
Stephen Tashi
If ##X_1,X_2,...X_N## be independent random variables with pdf ##g##, then by the law of large numbers,

$$E[f(X)]\to \frac{1}{N} \sum_{i=1}^N f(X_i) \quad \text{as N} \to \infty.$$

As far I understand, the sum above is a general Monte Carlo approximation of the integral.
The above is not the general formula for an unbiased estmator. ##\hat{I} = \frac{1}{N}\sum_{i=1}^N f(X_i)## is an unbiased estimator of ##\int_V f(X_i)## in the special case where we integrate over a volume ##V = 1## and when the ##X_i## are uniformly distributed over the volume. Discussions of Monte-Carlo integration often treat integration over a hypercube and you may have seen the above formula in that context.

but I have seen different approximations such as ##V\frac{1}{N}\sum_{i=1}^N f(X_i)## and ##\frac{1}{N}\sum_{i=1}^N \frac{f(X_i)}{g(X_i)}##, where in the former ##V## denotes the definite integral over the pdf. How are these related and derived?
##V\frac{1}{N}\sum_{i=1}^N f(X_i)## is a special case of ##\frac{1}{N}\sum_{i=1}^N \frac{f(X_i)}{g(X_i)}## where the probability density ##g(X_i)## has constant value ##1/V## (The ##X_i## are assumed to be independently and uniformly distributed over the volume. So ##g## is the uniform probability distribution over the volume ##V##, which has constant value ##1/V##. )

Here is an intuitive way to unify the two formulae.

Begin with the first formula. Think of the function ##f## as a density of "stuff" per unit volume, even if the physical interpretation of ##f## isn't naturally that way. Then ##\int_V f(\bar{x}) d\bar{x}## is the total stuff in a volume and ##\mu_f = (1/V) \int_V f(\bar{x}) d\bar{x} ## is average volume density of stuff in the volume.

Thinking of ##f## as density per unit volume, if we estimate the average ##f## by taking ##N## random samples of ##f## in the volume ##V##, it intuitive to use ##\hat{\mu}_ f = \frac{1}{N} \sum_{i=1}^N f(\bar{x}_i) ## as an estimator of the average density of stuff per unit volume and ##(V)(\hat{\mu}_f)## as the estimate of the total stuff in ##V##.

Now consider a case where we sample the volume in a non-uniform manner. Suppose the volume ##V## is composed of unequal sub-volumes ##V_1## and##V_2##. Suppose we take ##N_1## samples from ##V_1## and ##N_2## samples from ##V_2## To compensate for the bias in sampling and the inequality of the volumes, it is intuitive to use a weighted average as the estimator for the average volume density.

##\hat{\mu}_f = \frac{V_1}{V} \frac{1}{N_1} \sum_{\bar{x}_i \in V_1} f(\bar{x}_i) + \frac{V_2}{V} \frac{1}{N_2} \sum_{\bar{x}_i \in V_2} f(\bar{x}_i)##

The estimate for the total stuff in the volume is

##\hat{I} = (V)(\hat{\mu}_f) = \frac{V_1}{N_1} \sum_{\bar{x}_i \in V_1} f(\bar{x}_i) + \frac{V_2}{N_2} \sum_{\bar{x}_i \in V_2} f(\bar{x}_i) ##

Now, instead of considering ##N_1## and ##N_2## to have been deterministically selected, suppose we conduct random sampling with probability ##g_1## per unit volume that a sample is taken from ##V_1## and probability ##g_2## per unit volume that a sample is taken from ##V_2##.

[Note: Post has been edited from first version to add the "per unit volume" aspect to ##g_1, g_2## ]

"On the average" from a total of ##N## samples we take ## g_1 V_1 N ## samples from ##V_1## and ##g_2 V_2 N## samples from ##V_2##. Proceeding by superstition and ignorance, this suggests setting ##N_1 = g_1 V_1 N ## and ##N_2 = g_2 V_2 N## in the above formula.

We obtain:
##\hat{I} = \frac{V_1}{g_1 V_1 N} \sum_{\bar{x}_i \in V_1} f(\bar{x}_i) + \frac{V_2}{g_2 V_2 N} \sum_{\bar{x}_i \in V_2} f(\bar{x}_i) ##

##= \frac{1}{N} ( \sum_{\bar{x}_i \in V_1} \frac{f(\bar{x}_i)}{g_1} + \sum_{\bar{x}_i \in V_2} \frac{f(\bar{x}_i)}{g_2} ) ##

Generalizing from two volumes to many volumes, this suggests that if the volume ##V## is divided into many subvolumes, that a good estimator for the total stuff in ##V## is
##\hat{I} = \frac{1}{N} \sum_{\bar{x}_i \in V} \frac{ f(\bar{x}_i)}{g(\bar{x}_i)}##

Last edited: