Linear Regression with Many y for each x

Click For Summary
For linear regression with multiple y-values for each x, averaging the y-values to create a single dataset is a common approach, but it may not capture the variation in the data. If the y-values represent different measurements of the same variable, it's recommended to run regression on the raw data instead of averaging, as this preserves the variation. Using QR factorization allows for separate equations for each observation, yielding consistent regression coefficients with normal equations. If the y-values represent different variables, they should be analyzed separately. Proper handling of the data is crucial to maintain its integrity and accuracy in regression analysis.
WWGD
Science Advisor
Homework Helper
Messages
7,771
Reaction score
12,990
Hi,
Say we collect data points ##(x_i,y_j)## to do a linear regression, but so that for each ##x_i ## we collect
values ##y_{i1}, y_{i2},...,y_{ij} ## . Is there a standard way of doing linear regression with this type of dataset?
Would we, e.g., average the ##y_{ij}## abd define it to be ## y_i## to have a single data set ##(x_i, y_i) ## to do linear regression on?
Thanks.
 
Last edited:
Physics news on Phys.org
WWGD said:
Hi,
Say we collect data points ##(x_i,y_j)## to do a linear regression, but so that for each ##x_I ## we collect
values ##y_{i1}, y_{i2},...,y_{ij} ## . Is there a standard way of doing linear regression with this type of dataset?
Would we, e.g., average the ##y_{ij}## abd define it to be ## y_i## to have a single data set ##(x_i, y_i) ## to do linear regression on?
Thanks.
You can average the multiple readings if you wish. That's what people do when using normal equations to find the regression coefficients.

If you are using QR factorization to solve a rectangular system, you can write separate equations for each observation. The resulting regression coefficients should come out the same as with using the normal equations, unless there is something horribly wrong, numerically.
 
  • Like
Likes WWGD
Are you saying that the Ys represent different things, or are they results of the same variable measured from repeats of the same input x? If the former, just analyse the Y variables separately. If they are the latter, then do not combine the data. Averaging the Y values loses all the information about the variation of y for the same x. Run a regression on the raw data with the same x value repeated for each y value obtained.
 
The standard _A " operator" maps a Null Hypothesis Ho into a decision set { Do not reject:=1 and reject :=0}. In this sense ( HA)_A , makes no sense. Since H0, HA aren't exhaustive, can we find an alternative operator, _A' , so that ( H_A)_A' makes sense? Isn't Pearson Neyman related to this? Hope I'm making sense. Edit: I was motivated by a superficial similarity of the idea with double transposition of matrices M, with ## (M^{T})^{T}=M##, and just wanted to see if it made sense to talk...

Similar threads

  • · Replies 64 ·
3
Replies
64
Views
5K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 4 ·
Replies
4
Views
2K