Testing a Linear Stepwise Regression Model - Need Advice!

Click For Summary

Discussion Overview

The discussion revolves around the testing of a linear stepwise regression model using a dataset with one dependent and six independent variables. Participants are seeking advice on appropriate methods for validating the model's predictions against actual values, particularly focusing on the use of statistical tests and residual analysis.

Discussion Character

  • Technical explanation
  • Exploratory
  • Debate/contested

Main Points Raised

  • One participant describes their process of performing a linear stepwise regression on a 90% sample of their dataset and expresses confidence in their model.
  • The same participant seeks advice on how to test the model using the remaining 10% of the dataset, initially considering Chi Square but questioning its appropriateness.
  • Another participant suggests that testing the residuals for zero mean and homoscedasticity is important, recommending visual inspection and specific tests for these properties.
  • This participant also mentions the potential need to test residuals for normality and provides a link for further reading on homoscedasticity tests.
  • A third participant acknowledges the help received and indicates they have already plotted residuals and histograms for normality, expressing intent to explore the suggested tests.
  • A later reply raises the possibility of increasing the sample size to 70 observations and inquires about specific tests that could be used, mentioning Wilcoxon's test or Spearman's test as alternatives.

Areas of Agreement / Disagreement

Participants generally agree on the importance of residual analysis and the need for appropriate statistical tests, but there is no consensus on the best specific test to use for validating the model's predictions.

Contextual Notes

Some participants note limitations in their ability to perform certain tests due to software availability (e.g., SPSS) and the small sample size affecting the robustness of statistical conclusions.

davemk
Messages
8
Reaction score
0
Hi folks.

Just looking for some input please.

I have a dataset containing interval data (one dependent and 6 independent variables) and taken a random 90% sample (approx 300 observations). I've performed a linear stepwise regression on the 90%, in order to obtain a model to predict the dependent using a number of input variables. I'm confident that I've done this ok.

The issue comes with testing the model. I'm sure that this is probably a simple step but, for some reason, I'm really struggling with it and would be grateful for some advice.

In order to test the model, I'm using the 10% of the dataset that were not used in the linear regression. I've input the predictor variables into the model, which has given me an expected value. I now want to compare this to the actual value. I was originally going to use Chi Square but that seems to be probability based and I'm not sure it's appropriate.

I've been told Spearman's rho would probably be most appropriate although I'm still not 100% sure that's right. Essentially, I would only be testing whether my predicted values = actual values.All help appreciated. Thanks in advance.
 
Last edited:
Physics news on Phys.org
davemk said:
Hi folks.

Just looking for some input please.

I have a dataset containing interval data (one dependent and 6 independent variables) and taken a random 90% sample (approx 300 observations). I've performed a linear stepwise regression on the 90%, in order to obtain a model to predict the dependent using a number of input variables. I'm confident that I've done this ok.

The issue comes with testing the model. I'm sure that this is probably a simple step but, for some reason, I'm really struggling with it and would be grateful for some advice.

In order to test the model, I'm using the 10% of the dataset that were not used in the linear regression. I've input the predictor variables into the model, which has given me an expected value. I now want to compare this to the actual value. I was originally going to use Chi Square but that seems to be probability based and I'm not sure it's appropriate.

I've been told Spearman's rho would probably be most appropriate although I'm still not 100% sure that's right. Essentially, I would only be testing whether my predicted values = actual values.All help appreciated. Thanks in advance.

To some extent this depends on how clever you want to be. What you want to do is test that the residuals for the hold back sample have zero mean and that they are homoscedastic. With about 30 points you may have difficulty doing much more.

For the first of these I would just test for zero mean using the usual methods.

For the latter I would plot the residuals against the input variables and eyeball the data (at least to start with), but there are tests, see http://en.wikipedia.org/wiki/Homoscedasticity for a pointer.

You might also want to test the residuals for normality.

CB
 
That's a great help, thank you very much.

I've already plotted the residuals for obs vs expected and histograms for normailty so I'll have a look into the tests within the link you posted (I must admit, I've never heard of those tests so I'll have a read up on those).

Thanks again. I'll update the thread with my progress asap.
 
CaptainBlack said:
To some extent this depends on how clever you want to be.

With about 30 points you may have difficulty doing much more.

Hello again. If I was to get more data (say 70 observations) in order to test the model, is there a specific test that I could use? At the moment, I've performed a residual analysis and then I'm looking at performing a Wilcoxon's test or Spearman's test.

Any thoughts on this process, or alternatives? The procedures in the link above don't appear to be available in SPSS.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
4K
Replies
3
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K