Monte Carlo to compute uncertainties, why report the mean and SD?

In summary, the use of Monte Carlo method to estimate uncertainties involves providing probability distributions for each variable and getting a histogram of the physical property to be measured. According to the GUM, the mean and standard deviation should be reported as the uncertainty associated with the physical property. However, some argue that reporting the median and 68.3 confidence level interval would provide more information. The GUM is seen as a minimal standard and more information can be reported if desired. There is a distinction between the distribution of a random variable and the distribution of an estimator, and the goal is to find the uncertainty of the estimator.
  • #1
fluidistic
Gold Member
3,923
261
I am learning about the use of Monte Carlo to calculate/estimate uncertainties. If I take the example of a single measurement (i.e. I measure several quantities required to get an estimate of some physical property, but I do this only once), I can use Monte Carlo and some common sense for the inputs of the method. I provide the probability distributions of each variables (most of them as Gaussians for example, some as uniform over some interval, some as triangular, etc.), and I get a histogram of the physical property I want to "measure" or calculate from the measured data. Generally this histogram is not a Gaussian, it could be more like a log-normal for example.
According to the GUM (section 7.6, page 29 https://www.bipm.org/utils/common/documents/jcgm/JCGM_101_2008_E.pdf), one should report the mean and the standard deviation of the obtained histogram, as uncertainty associated to the physical property which I want to "measure"/calculate. I fail to understand why would one report the mean and the standard deviation instead of the median and the 68.3 (1 sigma for a Gaussian) confidence level interval. Wouldn't the latter give much more information about the "true" value of the physical property I am looking for? More precisely, wouldn't this give me a more probable value than the mean and SD would?

I have seen some papers (including one published in Nature) where they do not explicitely state what they report, but it looks like the median and the 68.3 CLI to me, which would make more sense than what the GUM suggests. But, I am almost sure, the GUM has the last word, so the mean and the SD should be reported. This would give no information about the skewness, I am not sure about its advantage(s) over the mean and the CLI.
 
Physics news on Phys.org
  • #2
fluidistic said:
I fail to understand why would one report the mean and the standard deviation instead of the median and the 68.3 (1 sigma for a Gaussian) confidence level interval. Wouldn't the latter give much more information about the "true" value of the physical property I am looking for?
Certainly. And even more information would be provided by simply plotting the histogram, or providing the Monte Carlo data, or explicitly providing the source data and the analysis code. There is a growing movement towards the latter in particular.

The goal of the GUM is to provide a standardized and minimal approach. The standardization is important so that readers can immediately understand the meaning. Certainly more information reported will provide more information, but requires more explanation by the writer. I would see the GUM as a minimal standard. You should report that as a minimum and certainly may report more if you wish.
 
  • Like
Likes FactChecker
  • #3
fluidistic said:
I provide the probability distributions of each variables (most of them as Gaussians for example, some as uniform over some interval, some as triangular, etc.), and I get a histogram of the physical property I want to "measure" or calculate from the measured data. Generally this histogram is not a Gaussian, it could be more like a log-normal for example.

We have to define the type of problem that is being attacked by the Monte-Carlo method.

Suppose we have a probability model ##Y = F(X_1,X_2,X_3)## where ##Y## is a random variable and the ##X_i## are random variables. One type of problem is to estimate the "uncertainty" (standard deviation) of ##Y##. A different type of problem is to estimate the mean value of ##Y## and to estimate the standard deviation of the estimator of the mean value.

There is a technical distinction between the distribution of a random variable ##Y## and the distribution of an estimator of the mean value of that random variable. (The estimator is also a random variable since it is a function of data that is assumed to be randomly generated.) The phrase "the uncertainty in ##Y##" could refer to either quantity. Which does the JCGM document assume is being reported?

My understanding of a typical scenario for "error propagation" calculations is the case where the distributions of the ##X_i## are defined by unknown parameters. We define some criteria for a "best" fit of a model of the form ##Y = F(X_1,X_2,X_3)## to given data for ##Y,X_1,X_2,X_3##. Solving for the best fit,we obtain (estimated) values for parameters of the random variables involved. If we want to publish a number that describes "the uncertainty" in one of the estimated parameters, we are in the interesting position of having only one value of the estimated parameter to work with. So how do we publish an estimate for its standard deviation?

Under a fitting method that gives a unique fit, the estimator ##\hat{c}## of the parameter ##c## is a function of the sample data. In the pencil-and-paper approach we approximate this function by a function that involves only expectations and other moments of the data. We assume the (true) expectations and moments of the random variables involved are the expectation and moments we estimate from the sample data. This gives us the distribution of ##\hat{c}##. Since ##\hat{c}## now has a known distribution, we can find its standard deviation.

The goal in the above method is to find the uncertainty (standard deviation) of ##\hat{c}##. This is not a Bayesian scenario. We are not assuming the (true) parameter ##c## has a probability distribution. The quantity that has a probability distribution is the estimator for ##c##. The only "uncertainty" in ##c## that it makes sense to report is the standard deviation of the estimator for ##c##.
 

1. What is Monte Carlo simulation?

Monte Carlo simulation is a computational method that uses random sampling to model and analyze complex systems or processes. It is commonly used in scientific research and engineering to simulate and predict the behavior of systems with uncertain inputs.

2. How does Monte Carlo simulation compute uncertainties?

Monte Carlo simulation uses random sampling to generate a large number of possible outcomes for a system or process. These outcomes are then analyzed statistically to estimate the range of possible values and the likelihood of different outcomes, providing a measure of uncertainty.

3. Why is it important to report the mean and standard deviation in Monte Carlo simulations?

The mean and standard deviation provide a measure of central tendency and spread, respectively, for the simulated outcomes. Reporting these values can help to summarize the results and provide insight into the range of possible outcomes and the likelihood of different values.

4. Can Monte Carlo simulation be used for any type of system or process?

Yes, Monte Carlo simulation can be used for a wide range of systems and processes, including physical, biological, and social systems. It is particularly useful for systems with complex interactions and uncertain inputs.

5. Are there any limitations to using Monte Carlo simulation for uncertainty analysis?

While Monte Carlo simulation can provide valuable insights into the uncertainties of a system or process, it does have some limitations. For example, it relies on the accuracy of the input data and assumptions made about the system, and it can be computationally intensive for complex systems with many variables.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
21
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
924
Replies
67
Views
5K
  • Set Theory, Logic, Probability, Statistics
Replies
19
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
845
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
Replies
2
Views
934
Back
Top