Combining model uncertainties with analytical uncertainties

imsolost · Jun 7, 2020

This is my problem : it all starts with some very basic linear least-square to find a single parameter 'b' :
Y = b X + e
where X,Y are observations and 'e' is a mean-zero error term.
I use it, i find 'b', done.

But I need also to calculate uncertainties on 'b'. The so-called "model uncertainty" is needed and I've looked in the reference "https://link.springer.com/content/pdf/10.1007/978-1-4614-7138-7.pdf" where they define the "standard error" : SE(b) as :

where sigma² = var(e).
So i have 'b' and var(b). Done.

Now here is the funny part :
I have mutliple sets of observations, coming from monte-carlo samplings on X and Y to account for analytical uncertaines on these observations. So basically, i can do the above stuff for each sample and i have like 1000 samples :
In the end, I have b_i and [var(b)]_i for i=1,...,1000.

Now the question : what is the mean and the standard deviation ?

What I've done :

1) i have looked in articles that do the same. i found this : "https://people.earth.yale.edu/sites/default/files/yang-li-etal-2018-montecarlo.pdf"
where (§2.3) they face the similar problem, they phrase it very well, but don't explain how they've done it...

2) Imagine you only account for analytical uncertainties. You would get 1000 values of 'b'. It would be easy to calculate a mean and a standard deviation with the common formulas for mean and sample variance. But here i don't have 1000 values ! I have 1000 random variables normally distributed following N(b_i, SE(b_i)).

3) i looked at multivariate normal distribution. When you have a list of random variables X1, X2,..., here is the expression for the mean and sample variance (basically what I'm looking for, i think) :

But i can't find how to calculate this if the X1, X2, ... are not iid (independent and identically distributed), because in my problem, they are independent but they are NOT identically distributed since N(b_i, SE(b_i)) is different for each 'i'.

4) For the mean it seems to be pretty straightforward and just be the mean of the b_i. The standard deviation is the real problem to me.Please help :(

Stephen Tashi · Jun 9, 2020

We need to use the vocabulary of mathematical statistics precisely.

In that vocabulary, a "statistic" is defined by specifying a function of the data in a sample.

imsolost said:

3) i looked at multivariate normal distribution. When you have a list of random variables X1, X2,..., here is the expression for the mean and sample variance (basically what I'm looking for, i think) :
View attachment 264285
But i can't find how to calculate this if the X1, X2, ... are not iid (independent and identically distributed),

The expression for ##\overline{X}_n## is the definition of the statistic called the "sample mean" of a sample with values ##X_1,X_2,...X_n##. The expression you give for ##S^2_n## is one possible definition for the "sample variance" of such data. These definitions do not specify any conditions on how the values ##X_1,X_2,...X_n## are generated. In particular they do not say that the each ##X_i## must be an independent realization of a random variable or that each ##X_i## must be an independent realization drawn from distribution of some single random variable ##X##.

Speaking literally, there is no problem calculating those statistics if you have specific set of data. They don't depend on how the data was generated.

A function of random variables is itself a random variable. So if we are in a situation where the values ##x_1,x_2,..,x_n## are regarded as realizations of ##n## random variables, then a statistic ##s(x_1,x_2,...,x_n)## is a random variable. As a random variable ##s## has a distribution. That distribution has parameters. For example, we can speak of "the mean of ##s##" in the sense of the mean of that distribution. If we have ##m## vectors of data of the form ##(x_1, x_2,...x_n)## realized from ##n## random variables, we can compute ##m## values ##s_1, s_2,...s_m## of the random variable ##s##. We can define statistics on such samples. For example, on a sample of ##m## values of ##s##, we defined the statistic called the sample mean.

This shows why discussing even the simplest problems in statistics becomes conceptually complicated. A statistic has its own statistics - in the sense of statistics defined on samples of the statistic. A statistic can be regarded as a random variable and a random variable has a distribution, so a statistic has parameters - in the sense of the parameters of its distribution. Parameters of a distribution are constants, not statistics. But statistics are often used to estimate parameters.

When that is the goal, we call the statistic an "estimator" of the parameter. This is the context where we begin to care about how sets of data ##x_1, x_2,...x_n## are generated - whether they are independent realizations of the same or different random variables, whether the random variables have particular types of distributions - normal distributions or uniform distributions etc.
If we know particular things about the distributions that generate the data, we can say particular things about the distribution of a statistic that is a function of the data.

Now the question : what is the mean and the standard deviation ?

From the above discussion, it should be clear that this question is not precise.

I think you want to know "What statistics are a good estimators for the population mean and population standard deviation (of some random variable) ?".

What random variable are we asking about? That's a subject for another post. Saying the random variable is "##b##" doesn't define it precisely. We need a statement of the probability model for the entire situation.

I have mutliple sets of observations, coming from monte-carlo samplings on X and Y to account for analytical uncertaines on these observations.

Glancing at https://people.earth.yale.edu/sites/default/files/yang-li-etal-2018-montecarlo.pdf , I see the term "analytical" has definition in terms of the physics of a situation. A reader unfamiliar with physics vocabulary could mistake "analytical" for a mathematical term - meaning something based on a particular mathematical formula. What does the term "analytical" indicate about a probability model in your case?

Stephen Tashi · Jun 11, 2020

imsolost said:

it all starts with some very basic linear least-square to find a single parameter 'b' :
Y = b X + e
where X,Y are observations and 'e' is a mean-zero error term.

I have mutliple sets of observations, coming from monte-carlo samplings on X and Y to account for analytical uncertaines on these observations.

Are you generating all the observations from a simulation? Or does "monte-carlo" samplings refer to randomly selecting the values of X to measure and then measuring the Y data with lab instruments? Is the Y data also from a simulation?

Combining model uncertainties with analytical uncertainties

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad The countability paradox of computable numbers

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Combining model uncertainties with analytical uncertainties

Similar threads