Multiple linear regression

Click For Summary
SUMMARY

This discussion focuses on performing multiple linear regression on a dataset of test scores influenced by three correlated variables: income, reading score, and math score. Due to multi-collinearity, it is essential to exclude at least two of these variables to avoid redundancy. The reading score shows the highest correlation with the test score, while income has a lower correlation of 0.85. Techniques such as Principal Component Analysis (PCA) and step-wise regression using tools like R and MATLAB are recommended for optimizing the regression model.

PREREQUISITES
  • Understanding of multiple linear regression concepts
  • Familiarity with multi-collinearity and its implications
  • Knowledge of Principal Component Analysis (PCA) techniques
  • Experience with statistical software such as R or MATLAB
NEXT STEPS
  • Research Principal Component Analysis (PCA) for multi-variate regression
  • Learn how to implement step-wise regression in R using stepAIC
  • Explore MATLAB's stepwisefit function for regression analysis
  • Study residual plots and their significance in regression diagnostics
USEFUL FOR

Data analysts, statisticians, and researchers involved in regression modeling and those seeking to optimize predictive models using correlated variables.

cutesteph
Messages
62
Reaction score
0
I am doing a multiple linear regression on a dataset. It is test scores. It has three highly correlated variables being income, reading score, and math score. Obviously since the test score is the sum of the math score and reading score would it be appropriate to exclude them simply based off that. Obvious two of the three must be removed due to multi-collinearity. Reading score has the highest correlation to test score and math is close. Income is only .85.
 
Physics news on Phys.org
Or should it be appropriate to use reading score since it has the best correlation and least spread even though test scores is the average of reading score and math score.
 
Hey cutesteph.

Removing data with multi-collinearity (and hence correlation) can be done in a number of ways.

I suggest you look at Principal Component Analyses (PCA) techniques for dealing with that in multi-variate regression.

The PCA techniques should be available in most statistical software packages - including R which is open source.

http://www.r-project.org/
 
cutesteph said:
I am doing a multiple linear regression on a dataset. It is test scores. It has three highly correlated variables being income, reading score, and math score. Obviously since the test score is the sum of the math score and reading score would it be appropriate to exclude them simply based off that. Obvious two of the three must be removed due to multi-collinearity. Reading score has the highest correlation to test score and math is close. Income is only .85.

If all you need is a regression model for describing test score using some subset of the three variables income, reading score, and math score, you don't need component analysis. Run through the different models (1 predictor, 2 predictors except for reading and math scores together), and judge the best one. Look carefully at residual plots in each case.

With that said, I'm still a little unsure of exactly what the goal of your project could be. If it is more sophisticated than simply coming away with a regression model
some extra information is needed.
 
A standard step-wise multiple linear regression would first do a regression using the independent variable that has the most statistical significance. Then it would remove the influence of that variable and determine if a second independent variable has enough significance in the modified data to add into the model. It would add the second variable that shows the most statistical significance. So it proceeds in a logical manor, only adding variables that make the most statistical sense. See MATLAB stepwisefit. or R stepAIC.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
4K
Replies
3
Views
3K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K