Undergrad Discriminant function analysis - stepwise or otherwise?

Click For Summary
SUMMARY

This discussion centers on the use of Discriminant Function Analysis (DFA) for binary classification in a BSc Biomed research project, specifically addressing the limitations of stepwise DFA versus leave-one-out cross-validation. The participants highlight that stepwise DFA may lead to overfitting, especially with a small dataset of 110 data points, and emphasize the importance of normal distribution assumptions in DFA. They recommend considering alternative methods due to potential inaccuracies in stepwise DFA when the sample does not meet these assumptions.

PREREQUISITES
  • Understanding of Discriminant Function Analysis (DFA)
  • Familiarity with SPSS software for statistical analysis
  • Knowledge of normal distribution and its implications in statistical modeling
  • Basic concepts of cross-validation techniques in statistical analysis
NEXT STEPS
  • Research the implications of normal distribution on Discriminant Function Analysis
  • Learn about leave-one-out cross-validation and its advantages over stepwise DFA
  • Explore the use of ANOVA for assessing multivariate normality in datasets
  • Investigate alternative classification methods suitable for small sample sizes
USEFUL FOR

Students and researchers in biomedical fields, statisticians analyzing small datasets, and anyone interested in the application of Discriminant Function Analysis for classification tasks.

Cesca Roma
Messages
4
Reaction score
0
TL;DR
Covid-19 isolation has rotted my last remaining brain cells and I can't seem to justify not using stepwise DFA in my dissertation even though I know I (probably) shouldn't.
I’m using discriminant function analysis to determine the potential accuracy of several biometric measurements being used in conjunction for binary classification purposes for my BSc Biomed research project. Overall I've only got 110 data points so it's a stretch but hey, that's anatomy!

What I’m struggling with, lacking very fundamental statistical knowledge and using SPSS to do all the hard stuff for me, is where the possible limitations lie in stepwise DFA in this context, and justifying use of the alternative leave-one-out, cross-validation method instead. I’ve been told it would be better to not use stepwise but don’t understand how to place the explanations I’m coming across in the context of my research.

Hope everyone's staying healthy, safe and sane, and thank you for the help in advance!

TL;DR: how do I justify not using stepwise DFA?
 
Physics news on Phys.org
Here is why/when you should consider stepwise DFA:
Code:
1. to maximally separate groups
2. to determine the most parsimonious way to discriminate - least comparisons, e.g.
3. ignore (not use) variables with little relationship to  discrimination.
James A. Holdnack, ... Grant L. Iverson, in WAIS-IV, WMS-IV, and ACS, 2013
Discriminant Function Analysis
Discriminant Function Analysis (DFA) has been used extensively in the past to derive optimal combinations of variables to differentiate groups because of its computational simplicity. However, DFA assumes that the predictors (i.e., tests included in the model) are each normally distributed and the set of predictors has a multivariate normal distribution along with homogeneous variance-covariance matrices

opinion only:
You want a binary split - two data "piles", but your dataset is small. The analysis can be overcompensated as the result of having a dataset with multiple good subpopulations. In other words, you create two piles when you really need more. So how would you test for (justify) this with your data? Did you see a multivariate normal distribtion for the set of predictors? ANOVA can be used to see what you get as a starter.

FWIW I have never used DFA on really small sample sizes from larger populations - but I worked in population biology. We always assumed, justifiably from previous work, our data was from multiple populations to start with.

Maybe @Stephen Tashi can help.
 
  • Like
Likes Cesca Roma
Hi Jim, thank you for taking a look at this

From what I understand of your feedback, over-complication and population specificity are both primary issues with stepwise DFA? I definitely agree, it's a lot for such little data but I feel for my project it's best to go through the motions of what I could do, if I had more impressive, population-wide data as you say!

So in differentiating stepwise from alternative DFA methods, would you say assumption of normal distribution is the biggest issue?

thanks again, and I hope Stephen might help shed some more light too! :)
 
Well, it fails to work correctly on non-normal samples. I do not know if SPSS barfs on data it cannot handle or not. You have to discriminate on integer values, floating point values also return an error. I think.

It does return a 39 for type I error on IBM -- too many non-rejects. In other words failure to discriminate. I'd be happy to be corrected on this assertion though.

Boy, I have not thought about this for eons... I hope I did not have RAM failure.

@Dale does any of this sound terrible?
 
Haha no I'm sure you didn't - I wish I understood it better to ask the right questions!

SPSS isn't necessarily returning an error for my data, but it's less than excellent as you'd imagine. Essentially, my supervisor is suggesting stepwise method would output better results that the 'enter independents together' DFA I used, but I think it would just make it more unreliable. Is that fair?
 
jim mcnamara said:
@Dale does any of this sound terrible?
Unfortunately I don’t know anything about DFA. I cannot give an informed opinion
 
  • Like
Likes Cesca Roma
Well, bottom line: go with what your advisor suggests. S/he has more knowledge of the data. Perhaps you can get a reason like 'the distribution has X feature (or deficit)...' so you can include that comment in your thesis.

You and I are doing this second-hand.
 
  • Like
Likes Cesca Roma
Sure, I agree, certainly the safest option at this point! Thank you again for taking the time to look over this!
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
3K
Replies
6
Views
4K
  • · Replies 13 ·
Replies
13
Views
3K