Combining model uncertainties with analytical uncertainties

Click For Summary
SUMMARY

The discussion focuses on calculating uncertainties for a parameter 'b' derived from a linear least-squares regression model, specifically Y = bX + e, where 'e' represents a mean-zero error term. The user seeks to determine the mean and standard deviation of 'b' from multiple Monte Carlo samples, each producing a normally distributed variable N(bi, SE(bi)). The challenge arises from the fact that the samples are independent but not identically distributed, complicating the calculation of standard deviation. The conversation emphasizes the need for precise statistical vocabulary and understanding of the underlying probability models.

PREREQUISITES
  • Linear least-squares regression analysis
  • Monte Carlo simulation techniques
  • Understanding of normal distributions and their properties
  • Statistical terminology, including "mean," "standard deviation," and "estimator"
NEXT STEPS
  • Study the calculation of mean and variance for independent but non-identically distributed random variables
  • Explore the properties of multivariate normal distributions
  • Learn about statistical estimators and their properties in the context of probability models
  • Investigate advanced Monte Carlo methods for uncertainty quantification
USEFUL FOR

Statisticians, data analysts, researchers in quantitative fields, and anyone involved in uncertainty quantification using Monte Carlo methods.

imsolost
Messages
18
Reaction score
1
This is my problem : it all starts with some very basic linear least-square to find a single parameter 'b' :
Y = b X + e
where X,Y are observations and 'e' is a mean-zero error term.
I use it, i find 'b', done.

But I need also to calculate uncertainties on 'b'. The so-called "model uncertainty" is needed and I've looked in the reference "https://link.springer.com/content/pdf/10.1007/978-1-4614-7138-7.pdf" where they define the "standard error" : SE(b) as :
1591562895264.png

where sigma² = var(e).
So i have 'b' and var(b). Done.

Now here is the funny part :
I have mutliple sets of observations, coming from monte-carlo samplings on X and Y to account for analytical uncertaines on these observations. So basically, i can do the above stuff for each sample and i have like 1000 samples :
In the end, I have bi and [var(b)]i for i=1,...,1000.

Now the question : what is the mean and the standard deviation ?

What I've done :

1) i have looked in articles that do the same. i found this : "https://people.earth.yale.edu/sites/default/files/yang-li-etal-2018-montecarlo.pdf"
where (§2.3) they face the similar problem, they phrase it very well, but don't explain how they've done it...

2) Imagine you only account for analytical uncertainties. You would get 1000 values of 'b'. It would be easy to calculate a mean and a standard deviation with the common formulas for mean and sample variance. But here i don't have 1000 values ! I have 1000 random variables normally distributed following N(bi, SE(bi)).

3) i looked at multivariate normal distribution. When you have a list of random variables X1, X2,..., here is the expression for the mean and sample variance (basically what I'm looking for, i think) :
1591563924394.png

But i can't find how to calculate this if the X1, X2, ... are not iid (independent and identically distributed), because in my problem, they are independent but they are NOT identically distributed since N(bi, SE(bi)) is different for each 'i'.

4) For the mean it seems to be pretty straightforward and just be the mean of the bi. The standard deviation is the real problem to me.Please help :(
 
Physics news on Phys.org
We need to use the vocabulary of mathematical statistics precisely.

In that vocabulary, a "statistic" is defined by specifying a function of the data in a sample.

imsolost said:
3) i looked at multivariate normal distribution. When you have a list of random variables X1, X2,..., here is the expression for the mean and sample variance (basically what I'm looking for, i think) :
View attachment 264285
But i can't find how to calculate this if the X1, X2, ... are not iid (independent and identically distributed),

The expression for ##\overline{X}_n## is the definition of the statistic called the "sample mean" of a sample with values ##X_1,X_2,...X_n##. The expression you give for ##S^2_n## is one possible definition for the "sample variance" of such data. These definitions do not specify any conditions on how the values ##X_1,X_2,...X_n## are generated. In particular they do not say that the each ##X_i## must be an independent realization of a random variable or that each ##X_i## must be an independent realization drawn from distribution of some single random variable ##X##.

Speaking literally, there is no problem calculating those statistics if you have specific set of data. They don't depend on how the data was generated.

A function of random variables is itself a random variable. So if we are in a situation where the values ##x_1,x_2,..,x_n## are regarded as realizations of ##n## random variables, then a statistic ##s(x_1,x_2,...,x_n)## is a random variable. As a random variable ##s## has a distribution. That distribution has parameters. For example, we can speak of "the mean of ##s##" in the sense of the mean of that distribution. If we have ##m## vectors of data of the form ##(x_1, x_2,...x_n)## realized from ##n## random variables, we can compute ##m## values ##s_1, s_2,...s_m## of the random variable ##s##. We can define statistics on such samples. For example, on a sample of ##m## values of ##s##, we defined the statistic called the sample mean.

This shows why discussing even the simplest problems in statistics becomes conceptually complicated. A statistic has its own statistics - in the sense of statistics defined on samples of the statistic. A statistic can be regarded as a random variable and a random variable has a distribution, so a statistic has parameters - in the sense of the parameters of its distribution. Parameters of a distribution are constants, not statistics. But statistics are often used to estimate parameters.

When that is the goal, we call the statistic an "estimator" of the parameter. This is the context where we begin to care about how sets of data ##x_1, x_2,...x_n## are generated - whether they are independent realizations of the same or different random variables, whether the random variables have particular types of distributions - normal distributions or uniform distributions etc.
If we know particular things about the distributions that generate the data, we can say particular things about the distribution of a statistic that is a function of the data.

Now the question : what is the mean and the standard deviation ?

From the above discussion, it should be clear that this question is not precise.

I think you want to know "What statistics are a good estimators for the population mean and population standard deviation (of some random variable) ?".

What random variable are we asking about? That's a subject for another post. Saying the random variable is "##b##" doesn't define it precisely. We need a statement of the probability model for the entire situation.

I have mutliple sets of observations, coming from monte-carlo samplings on X and Y to account for analytical uncertaines on these observations.

Glancing at https://people.earth.yale.edu/sites/default/files/yang-li-etal-2018-montecarlo.pdf , I see the term "analytical" has definition in terms of the physics of a situation. A reader unfamiliar with physics vocabulary could mistake "analytical" for a mathematical term - meaning something based on a particular mathematical formula. What does the term "analytical" indicate about a probability model in your case?
 
Last edited:
imsolost said:
it all starts with some very basic linear least-square to find a single parameter 'b' :
Y = b X + e
where X,Y are observations and 'e' is a mean-zero error term.

I have mutliple sets of observations, coming from monte-carlo samplings on X and Y to account for analytical uncertaines on these observations.

Are you generating all the observations from a simulation? Or does "monte-carlo" samplings refer to randomly selecting the values of X to measure and then measuring the Y data with lab instruments? Is the Y data also from a simulation?
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 21 ·
Replies
21
Views
3K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 19 ·
Replies
19
Views
2K
  • · Replies 18 ·
Replies
18
Views
4K
  • · Replies 12 ·
Replies
12
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 13 ·
Replies
13
Views
2K