Linear regression and bivariate normal, is there a relationship?

In summary, the book "Applied Linear Models" by Kutchner et al states that a linear regression model is of the form Y_i = B_0 + B_1 X_i + \epsilon_i, where Y_i is the value of the response variable in the ith trial, B_0 and B_1 are parameters, X_i is a known constant, and \epsilon_i is a random variable normally distributed. The underlying assumption of linear regression is that the data \{(x_1,y_1), (x_2,y_2),...,(x_n,y_x) \} is sampled from a bivariate normal distribution, with both X and Y being random variables. However, total least squares regression treats both X
  • #1
CantorSet
44
0
Hi everyone,

This is not a homework question. I just want to understand an aspect of linear regression better. The book "Applied Linear Models" by Kutchner et al, states that a linear regression model is of the form

[tex] Y_i = B_0 + B_1 X_i + \epsilon_i [/tex]

where
[itex] Y_i [/itex] is the value of the response variable in the ith trial
[itex] B_0, B_1 [/itex] are parameters
[itex] X_i [/itex] is a known constant
[itex] \epsilon_i [/itex] is a random variable, normally distributed.
Therefore, [itex]Y_i [/itex] is also a random variable, normally distributed but [itex]X_i [/itex] is a constant.

This confused me a bit because I always associated linear regression with the bivariate normal distribution. That is, the underlying assumption of linear regression is the data [itex]\{(x_1,y_1), (x_2,y_2),...,(x_n,y_x) \} [/itex] is sampled from a bivariate normal distribution. In which case, both X and Y are random variables. But in the formulation above, X is a known constant, while [itex]\epsilon[/itex] and therefore [itex]Y[/itex] are the random variables.

So in summary, what is the connection (if any) is between linear regression as formulated by Kutner and the bivariate normal.
 
Physics news on Phys.org
  • #2
CantorSet said:
the underlying assumption of linear regression is the data [itex]\{(x_1,y_1), (x_2,y_2),...,(x_n,y_x) \} [/itex] is sampled from a bivariate normal distribution. In which case, both X and Y are random variables.

I've never seen a treatment of regression that made that assumption. Are you confusing linear regession with some sort of "total least squares" regression?
http://en.wikipedia.org/wiki/Total_least_squares
 
  • #3
Stephen Tashi said:
I've never seen a treatment of regression that made that assumption. Are you confusing linear regession with some sort of "total least squares" regression?
http://en.wikipedia.org/wiki/Total_least_squares

Thanks for responding, Stephen.

Yea, that was my own confusion for making that assumption. Thanks for clearing that up.

By the way, total least squares is just a generalization of linear regression in that the curve you're fitting the data points to can be polynomials with degrees higher than 1, right? Or is there more to total least squares?
 
  • #4
Total least squares treats both X and Y as random variables.
 
  • #5


I can provide some clarification on the relationship between linear regression and the bivariate normal distribution.

First, it is important to understand that linear regression is a statistical method used to model the relationship between a dependent variable (Y) and one or more independent variables (X). The goal of linear regression is to find the best fitting line that describes the relationship between the variables.

On the other hand, the bivariate normal distribution is a probability distribution that describes the joint distribution of two continuous random variables. In the context of linear regression, the bivariate normal distribution is often used to model the relationship between the independent variable (X) and the error term (\epsilon). This is because the error term is assumed to follow a normal distribution in order to make statistical inference about the regression coefficients.

So, while the bivariate normal distribution is not explicitly used in the formulation of linear regression as described by Kutner, it is still an underlying assumption of the model. This means that the data points (x,y) are assumed to be sampled from a bivariate normal distribution, even though X is treated as a known constant.

In summary, there is a strong relationship between linear regression and the bivariate normal distribution, as the latter is often used to model the error term in the former. However, the formulation of linear regression may not explicitly mention the bivariate normal distribution, as it focuses on the relationship between the dependent and independent variables.
 

1. What is linear regression and bivariate normal distribution?

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. Bivariate normal distribution is a probability distribution that describes the relationship between two variables which follow a normal distribution.

2. How are linear regression and bivariate normal distribution related?

Linear regression assumes that the relationship between the dependent variable and independent variable is linear, while bivariate normal distribution assumes that the two variables have a normal distribution and are correlated.

3. Can linear regression be used to analyze bivariate normal data?

Yes, linear regression can be used to analyze bivariate normal data. However, the assumptions of linear regression must be met, such as linearity, normality, and independence of errors.

4. How do you determine if there is a relationship between two variables using linear regression and bivariate normal distribution?

In linear regression, the strength and direction of the relationship can be determined by the correlation coefficient (r). In bivariate normal distribution, the strength of the relationship can be determined by the covariance and the direction can be determined by the correlation coefficient (r).

5. Can the results of linear regression and bivariate normal distribution be used to make predictions?

Yes, the results of linear regression and bivariate normal distribution can be used to make predictions, as long as the assumptions are met and the data is appropriate for the analysis. However, it is important to note that correlation does not necessarily imply causation, so predictions should be made with caution.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
446
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
470
  • Set Theory, Logic, Probability, Statistics
2
Replies
64
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
464
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
919
Back
Top