Multiple linear regression

cutesteph · Mar 13, 2015

I am doing a multiple linear regression on a dataset. It is test scores. It has three highly correlated variables being income, reading score, and math score. Obviously since the test score is the sum of the math score and reading score would it be appropriate to exclude them simply based off that. Obvious two of the three must be removed due to multi-collinearity. Reading score has the highest correlation to test score and math is close. Income is only .85.

cutesteph · Mar 13, 2015

Or should it be appropriate to use reading score since it has the best correlation and least spread even though test scores is the average of reading score and math score.

chiro · Mar 14, 2015

Hey cutesteph.

Removing data with multi-collinearity (and hence correlation) can be done in a number of ways.

I suggest you look at Principal Component Analyses (PCA) techniques for dealing with that in multi-variate regression.

The PCA techniques should be available in most statistical software packages - including R which is open source.

http://www.r-project.org/

statdad · Mar 14, 2015

cutesteph said:

I am doing a multiple linear regression on a dataset. It is test scores. It has three highly correlated variables being income, reading score, and math score. Obviously since the test score is the sum of the math score and reading score would it be appropriate to exclude them simply based off that. Obvious two of the three must be removed due to multi-collinearity. Reading score has the highest correlation to test score and math is close. Income is only .85.

If all you need is a regression model for describing test score using some subset of the three variables income, reading score, and math score, you don't need component analysis. Run through the different models (1 predictor, 2 predictors except for reading and math scores together), and judge the best one. Look carefully at residual plots in each case.

With that said, I'm still a little unsure of exactly what the goal of your project could be. If it is more sophisticated than simply coming away with a regression model
some extra information is needed.

FactChecker · Apr 4, 2015

A standard step-wise multiple linear regression would first do a regression using the independent variable that has the most statistical significance. Then it would remove the influence of that variable and determine if a second independent variable has enough significance in the modified data to add into the model. It would add the second variable that shows the most statistical significance. So it proceeds in a logical manor, only adding variables that make the most statistical sense. See MATLAB stepwisefit. or R stepAIC.

Multiple linear regression

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad The countability paradox of computable numbers

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Multiple linear regression

Similar threads