MHB Testing a Linear Stepwise Regression Model - Need Advice!

Click For Summary
SUMMARY

The discussion focuses on testing a linear stepwise regression model using a dataset with one dependent and six independent variables. The user has successfully created a model using a 90% sample of approximately 300 observations but is struggling with the testing phase using the remaining 10%. They initially considered using Chi Square for comparison but are advised that Spearman's rho is more appropriate for assessing the correlation between predicted and actual values. Additionally, the importance of checking residuals for zero mean and homoscedasticity is emphasized, along with the suggestion to perform normality tests on the residuals.

PREREQUISITES
  • Understanding of linear stepwise regression modeling
  • Familiarity with statistical tests such as Spearman's rho and Chi Square
  • Knowledge of residual analysis and homoscedasticity
  • Experience with data visualization techniques for residuals
NEXT STEPS
  • Research the application of Spearman's rho in regression analysis
  • Learn about residual analysis techniques and tests for homoscedasticity
  • Explore normality tests for residuals in regression models
  • Investigate Wilcoxon's test and its application in model validation
USEFUL FOR

Data analysts, statisticians, and researchers involved in regression modeling and validation processes will benefit from this discussion.

davemk
Messages
8
Reaction score
0
Hi folks.

Just looking for some input please.

I have a dataset containing interval data (one dependent and 6 independent variables) and taken a random 90% sample (approx 300 observations). I've performed a linear stepwise regression on the 90%, in order to obtain a model to predict the dependent using a number of input variables. I'm confident that I've done this ok.

The issue comes with testing the model. I'm sure that this is probably a simple step but, for some reason, I'm really struggling with it and would be grateful for some advice.

In order to test the model, I'm using the 10% of the dataset that were not used in the linear regression. I've input the predictor variables into the model, which has given me an expected value. I now want to compare this to the actual value. I was originally going to use Chi Square but that seems to be probability based and I'm not sure it's appropriate.

I've been told Spearman's rho would probably be most appropriate although I'm still not 100% sure that's right. Essentially, I would only be testing whether my predicted values = actual values.All help appreciated. Thanks in advance.
 
Last edited:
Physics news on Phys.org
davemk said:
Hi folks.

Just looking for some input please.

I have a dataset containing interval data (one dependent and 6 independent variables) and taken a random 90% sample (approx 300 observations). I've performed a linear stepwise regression on the 90%, in order to obtain a model to predict the dependent using a number of input variables. I'm confident that I've done this ok.

The issue comes with testing the model. I'm sure that this is probably a simple step but, for some reason, I'm really struggling with it and would be grateful for some advice.

In order to test the model, I'm using the 10% of the dataset that were not used in the linear regression. I've input the predictor variables into the model, which has given me an expected value. I now want to compare this to the actual value. I was originally going to use Chi Square but that seems to be probability based and I'm not sure it's appropriate.

I've been told Spearman's rho would probably be most appropriate although I'm still not 100% sure that's right. Essentially, I would only be testing whether my predicted values = actual values.All help appreciated. Thanks in advance.

To some extent this depends on how clever you want to be. What you want to do is test that the residuals for the hold back sample have zero mean and that they are homoscedastic. With about 30 points you may have difficulty doing much more.

For the first of these I would just test for zero mean using the usual methods.

For the latter I would plot the residuals against the input variables and eyeball the data (at least to start with), but there are tests, see http://en.wikipedia.org/wiki/Homoscedasticity for a pointer.

You might also want to test the residuals for normality.

CB
 
That's a great help, thank you very much.

I've already plotted the residuals for obs vs expected and histograms for normailty so I'll have a look into the tests within the link you posted (I must admit, I've never heard of those tests so I'll have a read up on those).

Thanks again. I'll update the thread with my progress asap.
 
CaptainBlack said:
To some extent this depends on how clever you want to be.

With about 30 points you may have difficulty doing much more.

Hello again. If I was to get more data (say 70 observations) in order to test the model, is there a specific test that I could use? At the moment, I've performed a residual analysis and then I'm looking at performing a Wilcoxon's test or Spearman's test.

Any thoughts on this process, or alternatives? The procedures in the link above don't appear to be available in SPSS.
 
If there are an infinite number of natural numbers, and an infinite number of fractions in between any two natural numbers, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and... then that must mean that there are not only infinite infinities, but an infinite number of those infinities. and an infinite number of those...

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
4K
Replies
3
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K