Linear regression and bivariate normal, is there a relationship?

AI Thread Summary
Linear regression is defined by the equation Y_i = B_0 + B_1 X_i + ε_i, where Y_i is the response variable and ε_i is a normally distributed random variable. This formulation indicates that X_i is treated as a known constant, while Y_i is a random variable influenced by ε_i. The confusion arises from the assumption that linear regression data is sampled from a bivariate normal distribution, where both X and Y are random variables. Total least squares, on the other hand, generalizes linear regression by treating both X and Y as random variables, allowing for polynomial fitting. Understanding these distinctions clarifies the relationship between linear regression and the bivariate normal distribution.
CantorSet
Messages
44
Reaction score
0
Hi everyone,

This is not a homework question. I just want to understand an aspect of linear regression better. The book "Applied Linear Models" by Kutchner et al, states that a linear regression model is of the form

Y_i = B_0 + B_1 X_i + \epsilon_i

where
Y_i is the value of the response variable in the ith trial
B_0, B_1 are parameters
X_i is a known constant
\epsilon_i is a random variable, normally distributed.
Therefore, Y_i is also a random variable, normally distributed but X_i is a constant.

This confused me a bit because I always associated linear regression with the bivariate normal distribution. That is, the underlying assumption of linear regression is the data \{(x_1,y_1), (x_2,y_2),...,(x_n,y_x) \} is sampled from a bivariate normal distribution. In which case, both X and Y are random variables. But in the formulation above, X is a known constant, while \epsilon and therefore Y are the random variables.

So in summary, what is the connection (if any) is between linear regression as formulated by Kutner and the bivariate normal.
 
Physics news on Phys.org
CantorSet said:
the underlying assumption of linear regression is the data \{(x_1,y_1), (x_2,y_2),...,(x_n,y_x) \} is sampled from a bivariate normal distribution. In which case, both X and Y are random variables.

I've never seen a treatment of regression that made that assumption. Are you confusing linear regession with some sort of "total least squares" regression?
http://en.wikipedia.org/wiki/Total_least_squares
 
Stephen Tashi said:
I've never seen a treatment of regression that made that assumption. Are you confusing linear regession with some sort of "total least squares" regression?
http://en.wikipedia.org/wiki/Total_least_squares

Thanks for responding, Stephen.

Yea, that was my own confusion for making that assumption. Thanks for clearing that up.

By the way, total least squares is just a generalization of linear regression in that the curve you're fitting the data points to can be polynomials with degrees higher than 1, right? Or is there more to total least squares?
 
Total least squares treats both X and Y as random variables.
 
I was reading documentation about the soundness and completeness of logic formal systems. Consider the following $$\vdash_S \phi$$ where ##S## is the proof-system making part the formal system and ##\phi## is a wff (well formed formula) of the formal language. Note the blank on left of the turnstile symbol ##\vdash_S##, as far as I can tell it actually represents the empty set. So what does it mean ? I guess it actually means ##\phi## is a theorem of the formal system, i.e. there is a...
Back
Top