# Question about discrete Monte Carlo Summation

1. Jun 7, 2012

### mgamito

Hello all,

I'm aware of the Monte Carlo Summation method in discrete spaces, where you can approximate a very long summation over the entire space by a shorter one with only a few randomly selected terms from the original summation (weighted by the inverse probability density of them being chosen).

My question is: should the random selection of summation terms include replacement or not? That is, once one term is selected can it go back into the pool to be selected again? Or, stated yet another way: can one term from the original summation be selected more than once?

If both techniques are possible (with or without replacement), are there any known advantages or disadvantages to each one?

Thank you all,
manuel

2. Jun 7, 2012

### mathman

Either is allowed, but you will get better statistics without replacement. It is similar to quota sampling.

3. Jun 15, 2012

### mgamito

Hello mathman, I've been thinking about your reply but it seems to me that randomly choosing samples from the discrete set without replacement is going to skew the statistics.

Let's say I have a discrete random variable X with probability density f_X(x) that exists in a discrete set χ with N elements. The expected value of a function g of X is

E(g(X)) = Ʃ_{x \in χ} g(x)f_X(x)

If we take n random samples x_i of X, according to the probability density f_X(x), then we can approximate the expected value with the mean:

g_n(X) = 1/n Ʃ_{i=1}^{n} g(x_i)

If, however, we don't allow chosen samples to go back into the set to be possibly chosen again (which is what I mean by not allowing replacement), and then if we happen to choose n = N the entire set χ will be chosen *irrespective* of f_X(x). In this case, g_n(X) becomes g_N(X) - the arithmetic mean of g over χ. This arithmetic mean will only be equal to the E(g(X)) given above for the particular case of f_X(x) being a uniform probability density. So, it does seem that when using Monte Carlo over discrete domains, one does have to allow samples to be randomly chosen possibly more than once if the estimate of E(g(X)) is to remain unbiased.

4. Jun 15, 2012

### chiro

Hey mgamito and welcome to the forums.

Just to add context to your question, did you have a specific application or problem in mind in relation to this thread?

5. Jun 15, 2012

### mgamito

Hello chiro,

Yes, this is for a computer graphics application. I'm tracing a ray through an opaque volume and integrating the lighting effects along the way. I march along the ray in fixed size steps so this boils down to a very long summation where each term in that summation may be quite expensive to compute. My idea then is that I can apply Monte Carlo summation in the discrete domain of all the steps for each ray. I randomly select only the most relevant steps and perform a reduced summation. There are several tricks I can use to assign a probability density to the collection of steps so that I preferentially choose those that are more likely to contribute to the final pixel result.

Hope that was clear,
manuel

6. Jun 15, 2012

### chiro

In terms of computation, one thing that helps with regards to speed are look-up tables.

Back in the Quake days, we needed look-up tables for sine and cosine (but cosine was calculated from sine tables) because doing the calculation on the fly was too costly. Because of this it was pre-computed and stored in RAM.

One suggestion might be to take the same approach: simulate a random-vector at init time and then use some vector processing routines (like SIMD, SSE2, etc) and read the values from memory rather than computing them.

You can simulate a huge vector and if its big enough, you just keep cycling right to the end and then loop back around to the start and it should be OK with regards to randomness.

If you haven't looked into SSE2, I would recommend looking at this or getting libraries that do vector routines in an optimized way.

7. Jun 15, 2012

### mathman

In my original statement I was assuming a selection from a uniform distribution. If the distribution is not uniform, weighting is needed to compensate when you don't replace.