Discussion Overview
The discussion revolves around the application of multiple linear regression on a dataset of test scores, specifically addressing the issue of multi-collinearity among the independent variables: income, reading score, and math score. Participants explore whether certain variables should be excluded based on their correlations and the implications of these correlations for model selection.
Discussion Character
- Technical explanation
- Debate/contested
- Mathematical reasoning
Main Points Raised
- One participant suggests that due to multi-collinearity, two of the three variables must be removed, noting that reading score has the highest correlation with test scores, while income has a lower correlation of 0.85.
- Another participant questions whether it would be appropriate to retain the reading score despite it being part of the sum that defines the test score.
- A different participant proposes using Principal Component Analysis (PCA) as a method to address multi-collinearity in the regression analysis.
- One participant argues that if the goal is simply to create a regression model, it may not be necessary to use PCA and suggests testing different models with varying predictors while examining residual plots.
- Another participant describes a standard step-wise multiple linear regression approach, emphasizing the importance of statistical significance in adding variables to the model.
Areas of Agreement / Disagreement
Participants express differing views on how to handle multi-collinearity and the appropriateness of excluding certain variables. There is no consensus on the best approach to take regarding the inclusion or exclusion of the correlated variables.
Contextual Notes
Participants have not fully defined the specific goals of the regression analysis, which may influence their recommendations. There are also unresolved considerations regarding the assumptions underlying the use of PCA and step-wise regression methods.