Monte Carlo to compute uncertainties, why report the mean and SD?

fluidistic · Mar 16, 2021

I am learning about the use of Monte Carlo to calculate/estimate uncertainties. If I take the example of a single measurement (i.e. I measure several quantities required to get an estimate of some physical property, but I do this only once), I can use Monte Carlo and some common sense for the inputs of the method. I provide the probability distributions of each variables (most of them as Gaussians for example, some as uniform over some interval, some as triangular, etc.), and I get a histogram of the physical property I want to "measure" or calculate from the measured data. Generally this histogram is not a Gaussian, it could be more like a log-normal for example.
According to the GUM (section 7.6, page 29 https://www.bipm.org/utils/common/documents/jcgm/JCGM_101_2008_E.pdf), one should report the mean and the standard deviation of the obtained histogram, as uncertainty associated to the physical property which I want to "measure"/calculate. I fail to understand why would one report the mean and the standard deviation instead of the median and the 68.3 (1 sigma for a Gaussian) confidence level interval. Wouldn't the latter give much more information about the "true" value of the physical property I am looking for? More precisely, wouldn't this give me a more probable value than the mean and SD would?

I have seen some papers (including one published in Nature) where they do not explicitely state what they report, but it looks like the median and the 68.3 CLI to me, which would make more sense than what the GUM suggests. But, I am almost sure, the GUM has the last word, so the mean and the SD should be reported. This would give no information about the skewness, I am not sure about its advantage(s) over the mean and the CLI.

Dale · Mar 16, 2021

fluidistic said:

I fail to understand why would one report the mean and the standard deviation instead of the median and the 68.3 (1 sigma for a Gaussian) confidence level interval. Wouldn't the latter give much more information about the "true" value of the physical property I am looking for?

Certainly. And even more information would be provided by simply plotting the histogram, or providing the Monte Carlo data, or explicitly providing the source data and the analysis code. There is a growing movement towards the latter in particular.

The goal of the GUM is to provide a standardized and minimal approach. The standardization is important so that readers can immediately understand the meaning. Certainly more information reported will provide more information, but requires more explanation by the writer. I would see the GUM as a minimal standard. You should report that as a minimum and certainly may report more if you wish.

Stephen Tashi · Mar 16, 2021

fluidistic said:

I provide the probability distributions of each variables (most of them as Gaussians for example, some as uniform over some interval, some as triangular, etc.), and I get a histogram of the physical property I want to "measure" or calculate from the measured data. Generally this histogram is not a Gaussian, it could be more like a log-normal for example.

We have to define the type of problem that is being attacked by the Monte-Carlo method.

Suppose we have a probability model ##Y = F(X_1,X_2,X_3)## where ##Y## is a random variable and the ##X_i## are random variables. One type of problem is to estimate the "uncertainty" (standard deviation) of ##Y##. A different type of problem is to estimate the mean value of ##Y## and to estimate the standard deviation of the estimator of the mean value.

There is a technical distinction between the distribution of a random variable ##Y## and the distribution of an estimator of the mean value of that random variable. (The estimator is also a random variable since it is a function of data that is assumed to be randomly generated.) The phrase "the uncertainty in ##Y##" could refer to either quantity. Which does the JCGM document assume is being reported?

My understanding of a typical scenario for "error propagation" calculations is the case where the distributions of the ##X_i## are defined by unknown parameters. We define some criteria for a "best" fit of a model of the form ##Y = F(X_1,X_2,X_3)## to given data for ##Y,X_1,X_2,X_3##. Solving for the best fit,we obtain (estimated) values for parameters of the random variables involved. If we want to publish a number that describes "the uncertainty" in one of the estimated parameters, we are in the interesting position of having only one value of the estimated parameter to work with. So how do we publish an estimate for its standard deviation?

Under a fitting method that gives a unique fit, the estimator ##\hat{c}## of the parameter ##c## is a function of the sample data. In the pencil-and-paper approach we approximate this function by a function that involves only expectations and other moments of the data. We assume the (true) expectations and moments of the random variables involved are the expectation and moments we estimate from the sample data. This gives us the distribution of ##\hat{c}##. Since ##\hat{c}## now has a known distribution, we can find its standard deviation.

The goal in the above method is to find the uncertainty (standard deviation) of ##\hat{c}##. This is not a Bayesian scenario. We are not assuming the (true) parameter ##c## has a probability distribution. The quantity that has a probability distribution is the estimator for ##c##. The only "uncertainty" in ##c## that it makes sense to report is the standard deviation of the estimator for ##c##.

Monte Carlo to compute uncertainties, why report the mean and SD?

1. What is Monte Carlo simulation?

2. How does Monte Carlo simulation compute uncertainties?

3. Why is it important to report the mean and standard deviation in Monte Carlo simulations?

4. Can Monte Carlo simulation be used for any type of system or process?

5. Are there any limitations to using Monte Carlo simulation for uncertainty analysis?

Similar threads

Hot Threads

Recent Insights