Multiple variable prediction interval

Click For Summary

Discussion Overview

The discussion revolves around the calculation of prediction intervals in multiple variable regression analysis. Participants explore the transition from single variable to multiple variable regression, focusing on the mathematical formulation and necessary adjustments in the equations used for prediction intervals.

Discussion Character

  • Technical explanation
  • Mathematical reasoning
  • Exploratory

Main Points Raised

  • One participant presents the formula for a single variable prediction interval and seeks clarification on how to adapt it for multiple variable regression.
  • Another participant suggests that the standard deviation in the equation would need to be replaced by a cross-covariance matrix of the predictors.
  • A different participant provides a formula for the interval estimate for the mean value of the response and the interval for a particular value, emphasizing the use of matrix notation.
  • One participant expresses uncertainty about which sign of the square root to choose in the context of the prediction interval calculation.

Areas of Agreement / Disagreement

There is no consensus on the exact formulation for the multiple variable prediction interval, as participants propose different approaches and express uncertainty regarding specific components of the equations.

Contextual Notes

Participants reference the need for matrix equivalents in the equations and highlight that existing literature may not provide sufficient detail on the topic.

Who May Find This Useful

This discussion may be useful for individuals studying regression analysis, particularly those interested in the mathematical aspects of prediction intervals in multiple variable contexts.

Uniquebum
Messages
53
Reaction score
1
Hey!

I'm working with some regression related stuff at the moment and i'd need some help with multiple variable prediction interval. Prediction interval for a single variable can be calculated using

PI = \hat{\beta_0}+\hat{\beta_1}x_i \pm t^* s_e \sqrt{1+\frac{1}{n} + \frac{(x_i-mean(x))^2}{S_{xx}}}

where x can be thought as a 1 dimensional vector (or matrix/set) which holds the values x_0, x_1, x_2 and so on. Also, \hat{\beta_0}+\hat{\beta_1}x_i is a linear regression line \hat{y}. Finally, t^* is the t-percentile, s_e is standard deviation, n is the amount of points in the sample and S_{xx} = \sum{(x_i-mean(x))^2} from 1 --> n.

Now what does the equation look like for multiple variable regression?

I'd suppose \hat{\beta_0}+\hat{\beta_1}x_i is easily changed to
\hat{\beta_0}+\hat{\beta_1}x_{0i}+\hat{\beta_2}x_{1i}+\hat{\beta_3}x_{2i}+...
but what do i do with
\frac{(x_i-mean(x))^2}{S_{xx}}
?
 
Physics news on Phys.org
I'm sorry you are not generating any responses at the moment. Is there any additional information you can share with us? Any new findings?
 
Uniquebum said:
Now what does the equation look like for multiple variable regression?

I'd suppose \hat{\beta_0}+\hat{\beta_1}x_i is easily changed to
\hat{\beta_0}+\hat{\beta_1}x_{0i}+\hat{\beta_2}x_{1i}+\hat{\beta_3}x_{2i}+...
but what do i do with
\frac{(x_i-mean(x))^2}{S_{xx}}
?
Off the top of my head, I would say that s_e would be replaced by a cross-covariance matrix of the x_{j}s and that the square root would be replaced by a vector where each element is calculated with the square root equation.

PS. Your equations should drop the i subscript where x is now an arbitrary input rather than the sample data point i.

PPS. I don't know which sign of the square root to pick. I think that an authoritative answer to your OP will take more expertise than I have.
 
Last edited:
  • Like
Likes   Reactions: 1 person
You'll find formulae if you look in a book on multiple regression, linear models, or basic multivariate analysis. Essentially you replace the quantity you ask about with the matrix equivalent. If \widehat y is the fitted value from the equation, and \mathbf{x}_0 is the specified value of the predictor, the interval estimate for the mean value of the response is

<br /> \widehat y \pm t \sqrt{\, \hat{\sigma}^2 \mathbf{x}&#039;_0 \left(X&#039; X\right)^{-1} \mathbf{x}_0 }<br />

If you want the interval for the particular value it is

<br /> \widehat y \pm t \sqrt{\, \hat{\sigma}^2 \left(1 + \mathbf{x}&#039;_0 \left(X&#039; X\right)^{-1} \mathbf{x}_0 \right) }<br />
 
  • Like
Likes   Reactions: 1 person
Thanks a lot for the replies. I looked through a couple of books but they only talked about multiple variable regression in too vague manner. This'll help me get forward. Thanks again.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 19 ·
Replies
19
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
4K
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
26
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K