Question about discrete Monte Carlo Summation

Click For Summary

Discussion Overview

The discussion centers on the Monte Carlo Summation method applied to discrete spaces, specifically addressing whether random selection of summation terms should include replacement. Participants explore the implications of both techniques on statistical accuracy and bias in estimates.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • Some participants suggest that selecting samples without replacement can yield better statistics, comparing it to quota sampling.
  • Others argue that not allowing replacement may skew the statistics, particularly if the entire discrete set is chosen, which could lead to biased estimates unless the probability density is uniform.
  • One participant elaborates on the expected value of a function of a discrete random variable, indicating that the arithmetic mean will only match the expected value under specific conditions.
  • Another participant shares a practical application in computer graphics, discussing the integration of lighting effects along a ray and the potential for using Monte Carlo summation to optimize computations.
  • There are suggestions regarding the use of look-up tables and vector processing techniques to enhance computational efficiency in the context of Monte Carlo methods.

Areas of Agreement / Disagreement

Participants express differing views on the advantages and disadvantages of sampling with and without replacement, indicating that the discussion remains unresolved with multiple competing perspectives.

Contextual Notes

Participants note that the effectiveness of sampling methods may depend on the underlying probability density function and the specific application context, such as in computer graphics.

mgamito
Messages
7
Reaction score
0
Hello all,

I'm aware of the Monte Carlo Summation method in discrete spaces, where you can approximate a very long summation over the entire space by a shorter one with only a few randomly selected terms from the original summation (weighted by the inverse probability density of them being chosen).

My question is: should the random selection of summation terms include replacement or not? That is, once one term is selected can it go back into the pool to be selected again? Or, stated yet another way: can one term from the original summation be selected more than once?

If both techniques are possible (with or without replacement), are there any known advantages or disadvantages to each one?

Thank you all,
manuel
 
Physics news on Phys.org
Either is allowed, but you will get better statistics without replacement. It is similar to quota sampling.
 
Hello mathman, I've been thinking about your reply but it seems to me that randomly choosing samples from the discrete set without replacement is going to skew the statistics.

Let's say I have a discrete random variable X with probability density f_X(x) that exists in a discrete set χ with N elements. The expected value of a function g of X is

E(g(X)) = Ʃ_{x \in χ} g(x)f_X(x)

If we take n random samples x_i of X, according to the probability density f_X(x), then we can approximate the expected value with the mean:

g_n(X) = 1/n Ʃ_{i=1}^{n} g(x_i)

If, however, we don't allow chosen samples to go back into the set to be possibly chosen again (which is what I mean by not allowing replacement), and then if we happen to choose n = N the entire set χ will be chosen *irrespective* of f_X(x). In this case, g_n(X) becomes g_N(X) - the arithmetic mean of g over χ. This arithmetic mean will only be equal to the E(g(X)) given above for the particular case of f_X(x) being a uniform probability density. So, it does seem that when using Monte Carlo over discrete domains, one does have to allow samples to be randomly chosen possibly more than once if the estimate of E(g(X)) is to remain unbiased.
 
Hey mgamito and welcome to the forums.

Just to add context to your question, did you have a specific application or problem in mind in relation to this thread?
 
Hello chiro,

Yes, this is for a computer graphics application. I'm tracing a ray through an opaque volume and integrating the lighting effects along the way. I march along the ray in fixed size steps so this boils down to a very long summation where each term in that summation may be quite expensive to compute. My idea then is that I can apply Monte Carlo summation in the discrete domain of all the steps for each ray. I randomly select only the most relevant steps and perform a reduced summation. There are several tricks I can use to assign a probability density to the collection of steps so that I preferentially choose those that are more likely to contribute to the final pixel result.

Hope that was clear,
manuel
 
mgamito said:
Hello chiro,

Yes, this is for a computer graphics application. I'm tracing a ray through an opaque volume and integrating the lighting effects along the way. I march along the ray in fixed size steps so this boils down to a very long summation where each term in that summation may be quite expensive to compute. My idea then is that I can apply Monte Carlo summation in the discrete domain of all the steps for each ray. I randomly select only the most relevant steps and perform a reduced summation. There are several tricks I can use to assign a probability density to the collection of steps so that I preferentially choose those that are more likely to contribute to the final pixel result.

Hope that was clear,
manuel

In terms of computation, one thing that helps with regards to speed are look-up tables.

Back in the Quake days, we needed look-up tables for sine and cosine (but cosine was calculated from sine tables) because doing the calculation on the fly was too costly. Because of this it was pre-computed and stored in RAM.

One suggestion might be to take the same approach: simulate a random-vector at init time and then use some vector processing routines (like SIMD, SSE2, etc) and read the values from memory rather than computing them.

You can simulate a huge vector and if its big enough, you just keep cycling right to the end and then loop back around to the start and it should be OK with regards to randomness.

If you haven't looked into SSE2, I would recommend looking at this or getting libraries that do vector routines in an optimized way.
 
mgamito said:
Hello mathman, I've been thinking about your reply but it seems to me that randomly choosing samples from the discrete set without replacement is going to skew the statistics.

Let's say I have a discrete random variable X with probability density f_X(x) that exists in a discrete set χ with N elements. The expected value of a function g of X is

E(g(X)) = Ʃ_{x \in χ} g(x)f_X(x)

If we take n random samples x_i of X, according to the probability density f_X(x), then we can approximate the expected value with the mean:

g_n(X) = 1/n Ʃ_{i=1}^{n} g(x_i)

If, however, we don't allow chosen samples to go back into the set to be possibly chosen again (which is what I mean by not allowing replacement), and then if we happen to choose n = N the entire set χ will be chosen *irrespective* of f_X(x). In this case, g_n(X) becomes g_N(X) - the arithmetic mean of g over χ. This arithmetic mean will only be equal to the E(g(X)) given above for the particular case of f_X(x) being a uniform probability density. So, it does seem that when using Monte Carlo over discrete domains, one does have to allow samples to be randomly chosen possibly more than once if the estimate of E(g(X)) is to remain unbiased.

In my original statement I was assuming a selection from a uniform distribution. If the distribution is not uniform, weighting is needed to compensate when you don't replace.
 

Similar threads

  • · Replies 2 ·
Replies
2
Views
1K
Replies
67
Views
8K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
Replies
1
Views
1K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 11 ·
Replies
11
Views
3K