Linear Regression with Many y for each x

Click For Summary
SUMMARY

This discussion addresses the methodology for performing linear regression when multiple dependent variables (y-values) are associated with each independent variable (x-value). The consensus is that if the y-values represent repeated measurements of the same variable, they should not be averaged, as this would obscure the inherent variation. Instead, a regression should be conducted on the raw dataset, maintaining all y-values for each corresponding x-value. If the y-values represent different variables, they should be analyzed separately.

PREREQUISITES
  • Understanding of linear regression principles
  • Familiarity with normal equations in regression analysis
  • Knowledge of QR factorization for solving linear systems
  • Ability to interpret statistical data and variability
NEXT STEPS
  • Research the application of normal equations in linear regression
  • Learn about QR factorization techniques for regression analysis
  • Explore methods for analyzing multiple dependent variables in regression
  • Study the implications of averaging data in statistical analysis
USEFUL FOR

Data scientists, statisticians, and researchers involved in regression analysis, particularly those dealing with datasets that include multiple dependent variables for each independent variable.

WWGD
Science Advisor
Homework Helper
Messages
7,806
Reaction score
13,120
Hi,
Say we collect data points ##(x_i,y_j)## to do a linear regression, but so that for each ##x_i ## we collect
values ##y_{i1}, y_{i2},...,y_{ij} ## . Is there a standard way of doing linear regression with this type of dataset?
Would we, e.g., average the ##y_{ij}## abd define it to be ## y_i## to have a single data set ##(x_i, y_i) ## to do linear regression on?
Thanks.
 
Last edited:
Physics news on Phys.org
WWGD said:
Hi,
Say we collect data points ##(x_i,y_j)## to do a linear regression, but so that for each ##x_I ## we collect
values ##y_{i1}, y_{i2},...,y_{ij} ## . Is there a standard way of doing linear regression with this type of dataset?
Would we, e.g., average the ##y_{ij}## abd define it to be ## y_i## to have a single data set ##(x_i, y_i) ## to do linear regression on?
Thanks.
You can average the multiple readings if you wish. That's what people do when using normal equations to find the regression coefficients.

If you are using QR factorization to solve a rectangular system, you can write separate equations for each observation. The resulting regression coefficients should come out the same as with using the normal equations, unless there is something horribly wrong, numerically.
 
  • Like
Likes   Reactions: WWGD
Are you saying that the Ys represent different things, or are they results of the same variable measured from repeats of the same input x? If the former, just analyse the Y variables separately. If they are the latter, then do not combine the data. Averaging the Y values loses all the information about the variation of y for the same x. Run a regression on the raw data with the same x value repeated for each y value obtained.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 64 ·
3
Replies
64
Views
6K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 13 ·
Replies
13
Views
5K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
5K