I How to propagate errors through a regression & non-linear model?

Master1022 · Feb 18, 2021

Hi,

I was working on a predictive linear regression model and was hoping to obtain some bounds to represent the uncertainty present in the model.

Question:
I suppose this boils down into two separate components:
1. What is a good measure of uncertainty from a linear regression model? MSE, or perhaps another metric?
2. How can I propagate that metric through a non-linear function?

Context I have used a certain dataset in 3 different linear regression models to predict variables ## x_1 ##, ## x_2 ##, and ## x_3 ##. I know the mean squared errors for those predictions - each of the predictions uses the same input variables, but the regression weights are different. Then, I am calculating ## y = f(x_1, x_2, x_3) ## where ## f ## is a non-linear function (not complicated, but there are products of ## x_1 \cdot x_2 ## and ## x_1 \cdot x_3 ##). How can I calculate a metric that is measures 'uncertainty' such that I can give the output of the model as ## y \pm \Delta y ##?

- ## y ## is being forecast into the future and so I do not have access to data to compare it against and calculate a MSE/metric

Thanks in advance for any help.

Stephen Tashi · Feb 19, 2021

Master1022 said:

How can I calculate a metric that is measures 'uncertainty' such that I can give the output of the model as ## y \pm \Delta y ##?

Suppose we take the simple interpretation that the "uncertainty" of ##y## will be ##\sigma_y## , the standard deviation of ##y## when ##y## is considered as a random variable.

If you have a specific probability model, where each random variable involved has (or is assumed to have) a distribution with known parameters then we can discuss calculating the standard deviation of ##y##.

However, if we only assume the random variables involved are from a general family of distributions ( for example, if some have unknown means and variances) then saying we will calculate the standard deviation of ##y## is misleading. A better choice of words is to say that we will estimate the standard deviation of ##y##. The thing we can calculate is an estimator of the standard deviation of ##y##.

What "uncertainty" means in the latter situation is somewhat compliated because we can make an estimate of ##\sigma_y## as a function ##\hat{\sigma}_y## of the data, but in common language terms, there is some uncertainty in our estimate.

Which of the two situations applies to your problem?

BvU · Feb 19, 2021

Master1022 said:

I have used a certain dataset in 3 different linear regression models to predict variables x1, x2, and x3. I know the mean squared errors for those predictions - each of the predictions uses the same input variables, but the regression weights are different.

It would seem unlikely to me that these errors are uncorrelated. Can you explain what you did ?

Master1022 · Feb 23, 2021

Stephen Tashi said:

Suppose we take the simple interpretation that the "uncertainty" of ##y## will be ##\sigma_y## , the standard deviation of ##y## when ##y## is considered as a random variable.

If you have a specific probability model, where each random variable involved has (or is assumed to have) a distribution with known parameters then we can discuss calculating the standard deviation of ##y##.

However, if we only assume the random variables involved are from a general family of distributions ( for example, if some have unknown means and variances) then saying we will calculate the standard deviation of ##y## is misleading. A better choice of words is to say that we will estimate the standard deviation of ##y##. The thing we can calculate is an estimator of the standard deviation of ##y##.

What "uncertainty" means in the latter situation is somewhat compliated because we can make an estimate of ##\sigma_y## as a function ##\hat{\sigma}_y## of the data, but in common language terms, there is some uncertainty in our estimate.

Which of the two situations applies to your problem?

Thanks for your response. So my situation is the latter, so it looks like an estimate is what we are aiming for to measure the uncertainty. How would I go about propagating that through a non-linear function?

Master1022 · Feb 23, 2021

BvU said:

It would seem unlikely to me that these errors are uncorrelated. Can you explain what you did ?

Thanks for your response @BvU ! So I basically used those all three variables ## x_1 ##, ## x_2 ##, ## x_3 ## to calculate three predicted variables ## a ##, ## b ##, ## c ##. Then the output ## y ## had a form that can be condensed to:
y = c \cdot (b - a)
You are right that the errors for ## a ##, ##b##, and ##c## are likely not independent as they all were predictions using the same three variables (## x_1 ##, ## x_2 ##, ## x_3 ##). What is the best way to deal with such a situation in order to get an error estimate for ##y##?

BvU · Feb 24, 2021

Master1022 said:

I basically used those all three variables ## x_1 ##, ## x_2 ##, ## x_3 ## to calculate three predicted variables ## a ##, ## b ##, ## c ##. Then the output ## y ## had a form that can be condensed to:
y = c \cdot (b - a)
You are right that the errors for ## a ##, ##b##, and ##c## are likely not independent as they all were predictions using the same three variables (## x_1 ##, ## x_2 ##, ## x_3 ##). What is the best way to deal with such a situation in order to get an error estimate for ##y##?

From what you 'did basically' I can't follow what you did, so all I can give is general advice: find out the correlation matrix and use it to propagate the errors.

Master1022 · Feb 25, 2021

BvU said:

From what you 'did basically' I can't follow what you did, so all I can give is general advice: find out the correlation matrix and use it to propagate the errors.

Apologies, which part wasn't clear? I can try to explain further.

The same three time series ## x_1 ##, ## x_2 ##, and ## x_3 ## were used as inputs to three different linear regression models. The outputs of these models were ## a ##, ## b ##, and ## c ##. Then these constants were combined in a formula which was of the form ## c \cdot (b - a) ##. Which part was unclear?

Stephen Tashi · Feb 25, 2021

Master1022 said:

How would I go about propagating that through a non-linear function?

"Propagating error through a function ##f(x_1,x_2,x_3)##" is usually defined to mean estimating the standard deviation of the random variable ##y = f(x_1,x_2,x_3)##. Although estimating a standard deviation is a different concept that computing the standard deviation from known parameters of distributions, what is usually done is to make a lot of assumptions that justify using the sample values of parameters (such as mean, variance, covariance, etc.) as if they were they were the true values of the parameters. So, we end up in case 1 of post #2, even if are really in case 2 !

Proceeding as if we know the true values of all parameters involved, write a Taylor series (multinomial) approximation of ##f()## expanded about the point ##(x_1 - \overline{x_1}, x_2 - \overline{x_2}, x_3 - \overline{x_3})## where ##\overline{ x_k} ## is the mean of ##x_k##. (Use the values of the sample means as the values of the actual means.)

Truncate the expansion. Then compute the standard deviation of ##y## by doing the appropriate integration of the multinomial approximation.

Of course, doing the integration can be complicated, but it's possible in principle since the calculations only involve computing "moments" of multinomial functions. For example, to compute ##\overline{y}## we might have to find the expected value of a term like ##\frac{\partial^2 f}{ \partial {x_1}^2} \frac{\partial f}{\partial x_2} ((x_1 - \overline{x_1})^2( x_2 - \overline{x_2}) ##. The values of the partial derivatives are known because we are evaluating them at ##(\overline{x_1}, \overline{x_2},\overline{x_3})##. For the expected value of ##( x_1- \overline{x_1})^2 (x_2 - \overline{x_2} )##, we use the sample mean of the quantity ##(x_1- \overline{x_1})^2 (x_2 - \overline{x_2})##.

That's an outline of common practice. The mathematics of how well this way of estimating ##\sigma_y## works is a different matter.

For typical distributions, using sample values to estimate "higher moments" like ##\overline{ {x_1}^2 x_2 {x_3}^2} ## performs worse (in an appropriate technical sense) than using sample values to estimate lower moments like ##\overline{x_1}## or ## \overline{x_1 x_2} ## So including a lot of terms in the Taylor series does not necessarily make the estimate of ##\sigma_y## more reliable. The more terms you include, the more higher moments are involved, so the assumption that we can use sample moments as the actual higher moments becomes questionable.

Svein · Feb 26, 2021

The "goodness of fit" of a regression is usually measured by the "coefficient of correlation". This coefficient is defined as r^{2}=\frac{\sum (Y_{est}-\bar{Y})}{\sum(Y-\bar{Y})} where Y denotes the observed valued, Y_est are the values you get from your regression model and \bar{Y} is the mean value of the Ys. r² varies between 0 and 1, where 1 denotes a perfect correlation and 0 denotes no correlation.

I How to propagate errors through a regression & non-linear model?

Similar threads

Hot Threads

Insights Fermat's Last Theorem

B What could prove this wrong? I'm having a dispute with friends

B About a definition: What is the number of terms of a polynomial P(x)?

B How Many Straight Lines to Connect an N by M Array of Points in a Closed Loop?

B Geometry Puzzle with 20 points in a cross pattern

Recent Insights

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem