Discriminant function analysis - stepwise or otherwise?

Click For Summary

Discussion Overview

The discussion revolves around the use of discriminant function analysis (DFA) for binary classification in a research project involving biometric measurements. Participants explore the limitations of stepwise DFA compared to alternative methods like leave-one-out cross-validation, particularly in the context of a small dataset of 110 data points.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant expresses concern about the limitations of stepwise DFA due to the small sample size and seeks justification for using alternative methods.
  • Another participant outlines reasons to consider stepwise DFA, including its ability to maximally separate groups and determine a parsimonious model, while noting the assumptions of normal distribution and homogeneous variance-covariance matrices.
  • A participant raises the issue of overcomplication and population specificity as potential problems with stepwise DFA, suggesting that it may not be suitable for small datasets.
  • There is a discussion about the assumption of normal distribution being a significant concern for the effectiveness of DFA.
  • One participant mentions that SPSS may not handle non-normal samples correctly, leading to potential errors in analysis.
  • Another participant suggests that following the advisor's recommendations could be the safest approach, despite uncertainties about the data's distribution characteristics.

Areas of Agreement / Disagreement

Participants express differing views on the appropriateness of stepwise DFA versus alternative methods, with no clear consensus on which approach is superior given the dataset's limitations.

Contextual Notes

Limitations include the small sample size, potential non-normal distribution of predictors, and the unresolved nature of how SPSS handles such data. Participants also note the need for further clarification on the implications of these factors for their analysis.

Cesca Roma
Messages
4
Reaction score
0
TL;DR
Covid-19 isolation has rotted my last remaining brain cells and I can't seem to justify not using stepwise DFA in my dissertation even though I know I (probably) shouldn't.
I’m using discriminant function analysis to determine the potential accuracy of several biometric measurements being used in conjunction for binary classification purposes for my BSc Biomed research project. Overall I've only got 110 data points so it's a stretch but hey, that's anatomy!

What I’m struggling with, lacking very fundamental statistical knowledge and using SPSS to do all the hard stuff for me, is where the possible limitations lie in stepwise DFA in this context, and justifying use of the alternative leave-one-out, cross-validation method instead. I’ve been told it would be better to not use stepwise but don’t understand how to place the explanations I’m coming across in the context of my research.

Hope everyone's staying healthy, safe and sane, and thank you for the help in advance!

TL;DR: how do I justify not using stepwise DFA?
 
Physics news on Phys.org
Here is why/when you should consider stepwise DFA:
Code:
1. to maximally separate groups
2. to determine the most parsimonious way to discriminate - least comparisons, e.g.
3. ignore (not use) variables with little relationship to  discrimination.
James A. Holdnack, ... Grant L. Iverson, in WAIS-IV, WMS-IV, and ACS, 2013
Discriminant Function Analysis
Discriminant Function Analysis (DFA) has been used extensively in the past to derive optimal combinations of variables to differentiate groups because of its computational simplicity. However, DFA assumes that the predictors (i.e., tests included in the model) are each normally distributed and the set of predictors has a multivariate normal distribution along with homogeneous variance-covariance matrices

opinion only:
You want a binary split - two data "piles", but your dataset is small. The analysis can be overcompensated as the result of having a dataset with multiple good subpopulations. In other words, you create two piles when you really need more. So how would you test for (justify) this with your data? Did you see a multivariate normal distribtion for the set of predictors? ANOVA can be used to see what you get as a starter.

FWIW I have never used DFA on really small sample sizes from larger populations - but I worked in population biology. We always assumed, justifiably from previous work, our data was from multiple populations to start with.

Maybe @Stephen Tashi can help.
 
  • Like
Likes   Reactions: Cesca Roma
Hi Jim, thank you for taking a look at this

From what I understand of your feedback, over-complication and population specificity are both primary issues with stepwise DFA? I definitely agree, it's a lot for such little data but I feel for my project it's best to go through the motions of what I could do, if I had more impressive, population-wide data as you say!

So in differentiating stepwise from alternative DFA methods, would you say assumption of normal distribution is the biggest issue?

thanks again, and I hope Stephen might help shed some more light too! :)
 
Well, it fails to work correctly on non-normal samples. I do not know if SPSS barfs on data it cannot handle or not. You have to discriminate on integer values, floating point values also return an error. I think.

It does return a 39 for type I error on IBM -- too many non-rejects. In other words failure to discriminate. I'd be happy to be corrected on this assertion though.

Boy, I have not thought about this for eons... I hope I did not have RAM failure.

@Dale does any of this sound terrible?
 
Haha no I'm sure you didn't - I wish I understood it better to ask the right questions!

SPSS isn't necessarily returning an error for my data, but it's less than excellent as you'd imagine. Essentially, my supervisor is suggesting stepwise method would output better results that the 'enter independents together' DFA I used, but I think it would just make it more unreliable. Is that fair?
 
jim mcnamara said:
@Dale does any of this sound terrible?
Unfortunately I don’t know anything about DFA. I cannot give an informed opinion
 
  • Like
Likes   Reactions: Cesca Roma
Well, bottom line: go with what your advisor suggests. S/he has more knowledge of the data. Perhaps you can get a reason like 'the distribution has X feature (or deficit)...' so you can include that comment in your thesis.

You and I are doing this second-hand.
 
  • Like
Likes   Reactions: Cesca Roma
Sure, I agree, certainly the safest option at this point! Thank you again for taking the time to look over this!
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
4K
Replies
6
Views
4K
  • · Replies 13 ·
Replies
13
Views
4K