Discriminant function analysis - stepwise or otherwise?

In summary, the discriminant function analysis is being used to determine the potential accuracy of several biometric measurements in conjunction for binary classification purposes. However, the possible limitations lie in the stepwise DFA method in this context, and justifying use of the alternative leave-one-out, cross-validation method.
  • #1
Cesca Roma
TL;DR Summary
Covid-19 isolation has rotted my last remaining brain cells and I can't seem to justify not using stepwise DFA in my dissertation even though I know I (probably) shouldn't.
I’m using discriminant function analysis to determine the potential accuracy of several biometric measurements being used in conjunction for binary classification purposes for my BSc Biomed research project. Overall I've only got 110 data points so it's a stretch but hey, that's anatomy!

What I’m struggling with, lacking very fundamental statistical knowledge and using SPSS to do all the hard stuff for me, is where the possible limitations lie in stepwise DFA in this context, and justifying use of the alternative leave-one-out, cross-validation method instead. I’ve been told it would be better to not use stepwise but don’t understand how to place the explanations I’m coming across in the context of my research.

Hope everyone's staying healthy, safe and sane, and thank you for the help in advance!

TL;DR: how do I justify not using stepwise DFA?
Physics news on Phys.org
  • #2
Here is why/when you should consider stepwise DFA:
1. to maximally separate groups
2. to determine the most parsimonious way to discriminate - least comparisons, e.g.
3. ignore (not use) variables with little relationship to  discrimination.
James A. Holdnack, ... Grant L. Iverson, in WAIS-IV, WMS-IV, and ACS, 2013
Discriminant Function Analysis
Discriminant Function Analysis (DFA) has been used extensively in the past to derive optimal combinations of variables to differentiate groups because of its computational simplicity. However, DFA assumes that the predictors (i.e., tests included in the model) are each normally distributed and the set of predictors has a multivariate normal distribution along with homogeneous variance-covariance matrices

opinion only:
You want a binary split - two data "piles", but your dataset is small. The analysis can be overcompensated as the result of having a dataset with multiple good subpopulations. In other words, you create two piles when you really need more. So how would you test for (justify) this with your data? Did you see a multivariate normal distribtion for the set of predictors? ANOVA can be used to see what you get as a starter.

FWIW I have never used DFA on really small sample sizes from larger populations - but I worked in population biology. We always assumed, justifiably from previous work, our data was from multiple populations to start with.

Maybe @Stephen Tashi can help.
  • Like
Likes Cesca Roma
  • #3
Hi Jim, thank you for taking a look at this

From what I understand of your feedback, over-complication and population specificity are both primary issues with stepwise DFA? I definitely agree, it's a lot for such little data but I feel for my project it's best to go through the motions of what I could do, if I had more impressive, population-wide data as you say!

So in differentiating stepwise from alternative DFA methods, would you say assumption of normal distribution is the biggest issue?

thanks again, and I hope Stephen might help shed some more light too! :)
  • #4
Well, it fails to work correctly on non-normal samples. I do not know if SPSS barfs on data it cannot handle or not. You have to discriminate on integer values, floating point values also return an error. I think.

It does return a 39 for type I error on IBM -- too many non-rejects. In other words failure to discriminate. I'd be happy to be corrected on this assertion though.

Boy, I have not thought about this for eons... I hope I did not have RAM failure.

@Dale does any of this sound terrible?
  • #5
Haha no I'm sure you didn't - I wish I understood it better to ask the right questions!

SPSS isn't necessarily returning an error for my data, but it's less than excellent as you'd imagine. Essentially, my supervisor is suggesting stepwise method would output better results that the 'enter independents together' DFA I used, but I think it would just make it more unreliable. Is that fair?
  • #6
jim mcnamara said:
@Dale does any of this sound terrible?
Unfortunately I don’t know anything about DFA. I cannot give an informed opinion
  • Like
Likes Cesca Roma
  • #7
Well, bottom line: go with what your advisor suggests. S/he has more knowledge of the data. Perhaps you can get a reason like 'the distribution has X feature (or deficit)...' so you can include that comment in your thesis.

You and I are doing this second-hand.
  • Like
Likes Cesca Roma
  • #8
Sure, I agree, certainly the safest option at this point! Thank you again for taking the time to look over this!

Related to Discriminant function analysis - stepwise or otherwise?

1. What is discriminant function analysis?

Discriminant function analysis (DFA) is a statistical technique used to classify observations into predefined groups based on a set of predictor variables. It aims to find a linear combination of the predictors that best discriminates between the groups.

2. What is the difference between stepwise and non-stepwise DFA?

In stepwise DFA, the predictor variables are selected based on their ability to improve the classification of the groups. This is done in a step-by-step manner, with variables being added or removed from the model based on their significance. Non-stepwise DFA, on the other hand, involves including all predictor variables in the model without any selection process.

3. When should I use stepwise DFA?

Stepwise DFA is useful when you have a large number of predictor variables and want to identify the most important ones for classification. It can also be used when there is a strong correlation between the predictors, as it can help reduce multicollinearity in the model.

4. What are the assumptions of DFA?

The main assumptions of DFA include: (1) normality of the predictor variables within each group, (2) homogeneity of variances across groups, and (3) independence of observations. Additionally, the number of observations in each group should be relatively balanced.

5. How do I interpret the results of DFA?

The results of DFA typically include a discriminant function equation and a table showing the classification of observations into groups. The discriminant function equation can be used to predict the group membership of new observations. The classification table shows the percentage of correctly classified cases, as well as the overall accuracy of the model.

Similar threads

  • Science and Math Textbooks
  • STEM Academic Advising