Combining model uncertainties with analytical uncertainties

In summary, the problem at hand involves finding uncertainties on a parameter 'b' using linear least-square method and accounting for analytical uncertainties by performing Monte Carlo samplings. The question is how to calculate the mean and standard deviation of 'b' given 1000 samples, where each sample has a different distribution for 'b'. This requires a precise understanding of the vocabulary of mathematical statistics, where a statistic is defined as a function of data in a sample and can be regarded as a random variable with its own distribution and parameters. In order to determine the mean and standard deviation of 'b', a clear definition of the probability model for the situation is needed.
  • #1
imsolost
18
1
This is my problem : it all starts with some very basic linear least-square to find a single parameter 'b' :
Y = b X + e
where X,Y are observations and 'e' is a mean-zero error term.
I use it, i find 'b', done.

But I need also to calculate uncertainties on 'b'. The so-called "model uncertainty" is needed and I've looked in the reference "https://link.springer.com/content/pdf/10.1007/978-1-4614-7138-7.pdf" where they define the "standard error" : SE(b) as :
1591562895264.png

where sigma² = var(e).
So i have 'b' and var(b). Done.

Now here is the funny part :
I have mutliple sets of observations, coming from monte-carlo samplings on X and Y to account for analytical uncertaines on these observations. So basically, i can do the above stuff for each sample and i have like 1000 samples :
In the end, I have bi and [var(b)]i for i=1,...,1000.

Now the question : what is the mean and the standard deviation ?

What I've done :

1) i have looked in articles that do the same. i found this : "https://people.earth.yale.edu/sites/default/files/yang-li-etal-2018-montecarlo.pdf"
where (§2.3) they face the similar problem, they phrase it very well, but don't explain how they've done it...

2) Imagine you only account for analytical uncertainties. You would get 1000 values of 'b'. It would be easy to calculate a mean and a standard deviation with the common formulas for mean and sample variance. But here i don't have 1000 values ! I have 1000 random variables normally distributed following N(bi, SE(bi)).

3) i looked at multivariate normal distribution. When you have a list of random variables X1, X2,..., here is the expression for the mean and sample variance (basically what I'm looking for, i think) :
1591563924394.png

But i can't find how to calculate this if the X1, X2, ... are not iid (independant and identically distributed), because in my problem, they are independant but they are NOT identically distributed since N(bi, SE(bi)) is different for each 'i'.

4) For the mean it seems to be pretty straightforward and just be the mean of the bi. The standard deviation is the real problem to me.Please help :(
 
Physics news on Phys.org
  • #2
We need to use the vocabulary of mathematical statistics precisely.

In that vocabulary, a "statistic" is defined by specifying a function of the data in a sample.

imsolost said:
3) i looked at multivariate normal distribution. When you have a list of random variables X1, X2,..., here is the expression for the mean and sample variance (basically what I'm looking for, i think) :
View attachment 264285
But i can't find how to calculate this if the X1, X2, ... are not iid (independant and identically distributed),

The expression for ##\overline{X}_n## is the definition of the statistic called the "sample mean" of a sample with values ##X_1,X_2,...X_n##. The expression you give for ##S^2_n## is one possible definition for the "sample variance" of such data. These definitions do not specify any conditions on how the values ##X_1,X_2,...X_n## are generated. In particular they do not say that the each ##X_i## must be an independent realization of a random variable or that each ##X_i## must be an independent realization drawn from distribution of some single random variable ##X##.

Speaking literally, there is no problem calculating those statistics if you have specific set of data. They don't depend on how the data was generated.

A function of random variables is itself a random variable. So if we are in a situation where the values ##x_1,x_2,..,x_n## are regarded as realizations of ##n## random variables, then a statistic ##s(x_1,x_2,...,x_n)## is a random variable. As a random variable ##s## has a distribution. That distribution has parameters. For example, we can speak of "the mean of ##s##" in the sense of the mean of that distribution. If we have ##m## vectors of data of the form ##(x_1, x_2,...x_n)## realized from ##n## random variables, we can compute ##m## values ##s_1, s_2,...s_m## of the random variable ##s##. We can define statistics on such samples. For example, on a sample of ##m## values of ##s##, we defined the statistic called the sample mean.

This shows why discussing even the simplest problems in statistics becomes conceptually complicated. A statistic has its own statistics - in the sense of statistics defined on samples of the statistic. A statistic can be regarded as a random variable and a random variable has a distribution, so a statistic has parameters - in the sense of the parameters of its distribution. Parameters of a distribution are constants, not statistics. But statistics are often used to estimate parameters.

When that is the goal, we call the statistic an "estimator" of the parameter. This is the context where we begin to care about how sets of data ##x_1, x_2,...x_n## are generated - whether they are independent realizations of the same or different random variables, whether the random variables have particular types of distributions - normal distributions or uniform distributions etc.
If we know particular things about the distributions that generate the data, we can say particular things about the distribution of a statistic that is a function of the data.

Now the question : what is the mean and the standard deviation ?

From the above discussion, it should be clear that this question is not precise.

I think you want to know "What statistics are a good estimators for the population mean and population standard deviation (of some random variable) ?".

What random variable are we asking about? That's a subject for another post. Saying the random variable is "##b##" doesn't define it precisely. We need a statement of the probability model for the entire situation.

I have mutliple sets of observations, coming from monte-carlo samplings on X and Y to account for analytical uncertaines on these observations.

Glancing at https://people.earth.yale.edu/sites/default/files/yang-li-etal-2018-montecarlo.pdf , I see the term "analytical" has definition in terms of the physics of a situation. A reader unfamiliar with physics vocabulary could mistake "analytical" for a mathematical term - meaning something based on a particular mathematical formula. What does the term "analytical" indicate about a probability model in your case?
 
Last edited:
  • #3
imsolost said:
it all starts with some very basic linear least-square to find a single parameter 'b' :
Y = b X + e
where X,Y are observations and 'e' is a mean-zero error term.

I have mutliple sets of observations, coming from monte-carlo samplings on X and Y to account for analytical uncertaines on these observations.

Are you generating all the observations from a simulation? Or does "monte-carlo" samplings refer to randomly selecting the values of X to measure and then measuring the Y data with lab instruments? Is the Y data also from a simulation?
 

1. What is the difference between model uncertainties and analytical uncertainties?

Model uncertainties refer to the uncertainties associated with the mathematical or computational models used to describe a system or phenomenon. Analytical uncertainties, on the other hand, refer to the uncertainties associated with the measurement or analysis of data.

2. Why is it important to combine model uncertainties with analytical uncertainties?

Combining model uncertainties with analytical uncertainties allows for a more accurate and comprehensive understanding of the overall uncertainty in a scientific study. It also helps to identify areas where improvements can be made in both the models and the measurement techniques.

3. How can model uncertainties and analytical uncertainties be quantified?

Model uncertainties can be quantified through techniques such as sensitivity analysis and Monte Carlo simulations. Analytical uncertainties can be quantified through statistical methods such as standard deviation or confidence intervals.

4. What are some challenges in combining model uncertainties with analytical uncertainties?

One challenge is that model uncertainties and analytical uncertainties may have different sources and may be difficult to separate. Another challenge is that the combination of uncertainties may lead to a larger overall uncertainty than either one individually, which can be difficult to interpret.

5. How can the combined uncertainties be used in decision making?

The combined uncertainties can be used to assess the reliability and robustness of the scientific results and to inform decision making. They can also be used to identify areas where further research or improvements in measurement techniques are needed.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
21
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
19
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
475
  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
794
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
28
Views
2K
Back
Top