Graduate Combining model uncertainties with analytical uncertainties

Click For Summary
The discussion centers on calculating uncertainties for a parameter 'b' derived from a linear least-squares model, specifically addressing model and analytical uncertainties. The user has multiple Monte Carlo samples of observations for X and Y, resulting in 1000 estimates of 'b' and their variances. The challenge lies in determining the mean and standard deviation of these estimates, given that they are not identically distributed due to varying standard errors. The conversation highlights the complexity of defining statistics when dealing with random variables and emphasizes the need for a clear probability model to accurately estimate population parameters. Ultimately, the discussion underscores the intricacies of statistical analysis in the context of model uncertainties.
imsolost
Messages
18
Reaction score
1
This is my problem : it all starts with some very basic linear least-square to find a single parameter 'b' :
Y = b X + e
where X,Y are observations and 'e' is a mean-zero error term.
I use it, i find 'b', done.

But I need also to calculate uncertainties on 'b'. The so-called "model uncertainty" is needed and I've looked in the reference "https://link.springer.com/content/pdf/10.1007/978-1-4614-7138-7.pdf" where they define the "standard error" : SE(b) as :
1591562895264.png

where sigma² = var(e).
So i have 'b' and var(b). Done.

Now here is the funny part :
I have mutliple sets of observations, coming from monte-carlo samplings on X and Y to account for analytical uncertaines on these observations. So basically, i can do the above stuff for each sample and i have like 1000 samples :
In the end, I have bi and [var(b)]i for i=1,...,1000.

Now the question : what is the mean and the standard deviation ?

What I've done :

1) i have looked in articles that do the same. i found this : "https://people.earth.yale.edu/sites/default/files/yang-li-etal-2018-montecarlo.pdf"
where (§2.3) they face the similar problem, they phrase it very well, but don't explain how they've done it...

2) Imagine you only account for analytical uncertainties. You would get 1000 values of 'b'. It would be easy to calculate a mean and a standard deviation with the common formulas for mean and sample variance. But here i don't have 1000 values ! I have 1000 random variables normally distributed following N(bi, SE(bi)).

3) i looked at multivariate normal distribution. When you have a list of random variables X1, X2,..., here is the expression for the mean and sample variance (basically what I'm looking for, i think) :
1591563924394.png

But i can't find how to calculate this if the X1, X2, ... are not iid (independent and identically distributed), because in my problem, they are independent but they are NOT identically distributed since N(bi, SE(bi)) is different for each 'i'.

4) For the mean it seems to be pretty straightforward and just be the mean of the bi. The standard deviation is the real problem to me.Please help :(
 
Physics news on Phys.org
We need to use the vocabulary of mathematical statistics precisely.

In that vocabulary, a "statistic" is defined by specifying a function of the data in a sample.

imsolost said:
3) i looked at multivariate normal distribution. When you have a list of random variables X1, X2,..., here is the expression for the mean and sample variance (basically what I'm looking for, i think) :
View attachment 264285
But i can't find how to calculate this if the X1, X2, ... are not iid (independent and identically distributed),

The expression for ##\overline{X}_n## is the definition of the statistic called the "sample mean" of a sample with values ##X_1,X_2,...X_n##. The expression you give for ##S^2_n## is one possible definition for the "sample variance" of such data. These definitions do not specify any conditions on how the values ##X_1,X_2,...X_n## are generated. In particular they do not say that the each ##X_i## must be an independent realization of a random variable or that each ##X_i## must be an independent realization drawn from distribution of some single random variable ##X##.

Speaking literally, there is no problem calculating those statistics if you have specific set of data. They don't depend on how the data was generated.

A function of random variables is itself a random variable. So if we are in a situation where the values ##x_1,x_2,..,x_n## are regarded as realizations of ##n## random variables, then a statistic ##s(x_1,x_2,...,x_n)## is a random variable. As a random variable ##s## has a distribution. That distribution has parameters. For example, we can speak of "the mean of ##s##" in the sense of the mean of that distribution. If we have ##m## vectors of data of the form ##(x_1, x_2,...x_n)## realized from ##n## random variables, we can compute ##m## values ##s_1, s_2,...s_m## of the random variable ##s##. We can define statistics on such samples. For example, on a sample of ##m## values of ##s##, we defined the statistic called the sample mean.

This shows why discussing even the simplest problems in statistics becomes conceptually complicated. A statistic has its own statistics - in the sense of statistics defined on samples of the statistic. A statistic can be regarded as a random variable and a random variable has a distribution, so a statistic has parameters - in the sense of the parameters of its distribution. Parameters of a distribution are constants, not statistics. But statistics are often used to estimate parameters.

When that is the goal, we call the statistic an "estimator" of the parameter. This is the context where we begin to care about how sets of data ##x_1, x_2,...x_n## are generated - whether they are independent realizations of the same or different random variables, whether the random variables have particular types of distributions - normal distributions or uniform distributions etc.
If we know particular things about the distributions that generate the data, we can say particular things about the distribution of a statistic that is a function of the data.

Now the question : what is the mean and the standard deviation ?

From the above discussion, it should be clear that this question is not precise.

I think you want to know "What statistics are a good estimators for the population mean and population standard deviation (of some random variable) ?".

What random variable are we asking about? That's a subject for another post. Saying the random variable is "##b##" doesn't define it precisely. We need a statement of the probability model for the entire situation.

I have mutliple sets of observations, coming from monte-carlo samplings on X and Y to account for analytical uncertaines on these observations.

Glancing at https://people.earth.yale.edu/sites/default/files/yang-li-etal-2018-montecarlo.pdf , I see the term "analytical" has definition in terms of the physics of a situation. A reader unfamiliar with physics vocabulary could mistake "analytical" for a mathematical term - meaning something based on a particular mathematical formula. What does the term "analytical" indicate about a probability model in your case?
 
Last edited:
imsolost said:
it all starts with some very basic linear least-square to find a single parameter 'b' :
Y = b X + e
where X,Y are observations and 'e' is a mean-zero error term.

I have mutliple sets of observations, coming from monte-carlo samplings on X and Y to account for analytical uncertaines on these observations.

Are you generating all the observations from a simulation? Or does "monte-carlo" samplings refer to randomly selecting the values of X to measure and then measuring the Y data with lab instruments? Is the Y data also from a simulation?
 
If there are an infinite number of natural numbers, and an infinite number of fractions in between any two natural numbers, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and... then that must mean that there are not only infinite infinities, but an infinite number of those infinities. and an infinite number of those...

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 21 ·
Replies
21
Views
3K
  • · Replies 19 ·
Replies
19
Views
2K
  • · Replies 12 ·
Replies
12
Views
3K
  • · Replies 18 ·
Replies
18
Views
3K
Replies
20
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K