Question about discrete Monte Carlo Summation

Click For Summary
SUMMARY

The forum discussion centers on the application of the Monte Carlo Summation method in discrete spaces, specifically addressing whether to sample with or without replacement. It is established that sampling without replacement can skew statistics, particularly when the probability density function is not uniform. The discussion highlights that for unbiased estimates of expected values in Monte Carlo methods, allowing for replacement is essential. Additionally, techniques such as look-up tables and vector processing routines like SIMD and SSE2 are recommended for optimizing computations in graphics applications.

PREREQUISITES
  • Understanding of Monte Carlo methods in statistics
  • Familiarity with discrete probability distributions
  • Knowledge of computer graphics principles, particularly ray tracing
  • Experience with SIMD and SSE2 for performance optimization
NEXT STEPS
  • Research Monte Carlo methods for discrete probability distributions
  • Explore techniques for implementing ray tracing in computer graphics
  • Learn about the use of look-up tables for optimizing mathematical computations
  • Investigate SIMD and SSE2 programming for enhanced performance in graphics applications
USEFUL FOR

Computer graphics developers, data scientists utilizing Monte Carlo methods, and software engineers focused on performance optimization in computational tasks.

mgamito
Messages
7
Reaction score
0
Hello all,

I'm aware of the Monte Carlo Summation method in discrete spaces, where you can approximate a very long summation over the entire space by a shorter one with only a few randomly selected terms from the original summation (weighted by the inverse probability density of them being chosen).

My question is: should the random selection of summation terms include replacement or not? That is, once one term is selected can it go back into the pool to be selected again? Or, stated yet another way: can one term from the original summation be selected more than once?

If both techniques are possible (with or without replacement), are there any known advantages or disadvantages to each one?

Thank you all,
manuel
 
Physics news on Phys.org
Either is allowed, but you will get better statistics without replacement. It is similar to quota sampling.
 
Hello mathman, I've been thinking about your reply but it seems to me that randomly choosing samples from the discrete set without replacement is going to skew the statistics.

Let's say I have a discrete random variable X with probability density f_X(x) that exists in a discrete set χ with N elements. The expected value of a function g of X is

E(g(X)) = Ʃ_{x \in χ} g(x)f_X(x)

If we take n random samples x_i of X, according to the probability density f_X(x), then we can approximate the expected value with the mean:

g_n(X) = 1/n Ʃ_{i=1}^{n} g(x_i)

If, however, we don't allow chosen samples to go back into the set to be possibly chosen again (which is what I mean by not allowing replacement), and then if we happen to choose n = N the entire set χ will be chosen *irrespective* of f_X(x). In this case, g_n(X) becomes g_N(X) - the arithmetic mean of g over χ. This arithmetic mean will only be equal to the E(g(X)) given above for the particular case of f_X(x) being a uniform probability density. So, it does seem that when using Monte Carlo over discrete domains, one does have to allow samples to be randomly chosen possibly more than once if the estimate of E(g(X)) is to remain unbiased.
 
Hey mgamito and welcome to the forums.

Just to add context to your question, did you have a specific application or problem in mind in relation to this thread?
 
Hello chiro,

Yes, this is for a computer graphics application. I'm tracing a ray through an opaque volume and integrating the lighting effects along the way. I march along the ray in fixed size steps so this boils down to a very long summation where each term in that summation may be quite expensive to compute. My idea then is that I can apply Monte Carlo summation in the discrete domain of all the steps for each ray. I randomly select only the most relevant steps and perform a reduced summation. There are several tricks I can use to assign a probability density to the collection of steps so that I preferentially choose those that are more likely to contribute to the final pixel result.

Hope that was clear,
manuel
 
mgamito said:
Hello chiro,

Yes, this is for a computer graphics application. I'm tracing a ray through an opaque volume and integrating the lighting effects along the way. I march along the ray in fixed size steps so this boils down to a very long summation where each term in that summation may be quite expensive to compute. My idea then is that I can apply Monte Carlo summation in the discrete domain of all the steps for each ray. I randomly select only the most relevant steps and perform a reduced summation. There are several tricks I can use to assign a probability density to the collection of steps so that I preferentially choose those that are more likely to contribute to the final pixel result.

Hope that was clear,
manuel

In terms of computation, one thing that helps with regards to speed are look-up tables.

Back in the Quake days, we needed look-up tables for sine and cosine (but cosine was calculated from sine tables) because doing the calculation on the fly was too costly. Because of this it was pre-computed and stored in RAM.

One suggestion might be to take the same approach: simulate a random-vector at init time and then use some vector processing routines (like SIMD, SSE2, etc) and read the values from memory rather than computing them.

You can simulate a huge vector and if its big enough, you just keep cycling right to the end and then loop back around to the start and it should be OK with regards to randomness.

If you haven't looked into SSE2, I would recommend looking at this or getting libraries that do vector routines in an optimized way.
 
mgamito said:
Hello mathman, I've been thinking about your reply but it seems to me that randomly choosing samples from the discrete set without replacement is going to skew the statistics.

Let's say I have a discrete random variable X with probability density f_X(x) that exists in a discrete set χ with N elements. The expected value of a function g of X is

E(g(X)) = Ʃ_{x \in χ} g(x)f_X(x)

If we take n random samples x_i of X, according to the probability density f_X(x), then we can approximate the expected value with the mean:

g_n(X) = 1/n Ʃ_{i=1}^{n} g(x_i)

If, however, we don't allow chosen samples to go back into the set to be possibly chosen again (which is what I mean by not allowing replacement), and then if we happen to choose n = N the entire set χ will be chosen *irrespective* of f_X(x). In this case, g_n(X) becomes g_N(X) - the arithmetic mean of g over χ. This arithmetic mean will only be equal to the E(g(X)) given above for the particular case of f_X(x) being a uniform probability density. So, it does seem that when using Monte Carlo over discrete domains, one does have to allow samples to be randomly chosen possibly more than once if the estimate of E(g(X)) is to remain unbiased.

In my original statement I was assuming a selection from a uniform distribution. If the distribution is not uniform, weighting is needed to compensate when you don't replace.
 

Similar threads

  • · Replies 2 ·
Replies
2
Views
1K
Replies
67
Views
7K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 12 ·
Replies
12
Views
2K
Replies
1
Views
2K
Replies
1
Views
1K