Discriminant function analysis - stepwise or otherwise?

Cesca Roma · Apr 19, 2020

I’m using discriminant function analysis to determine the potential accuracy of several biometric measurements being used in conjunction for binary classification purposes for my BSc Biomed research project. Overall I've only got 110 data points so it's a stretch but hey, that's anatomy!

What I’m struggling with, lacking very fundamental statistical knowledge and using SPSS to do all the hard stuff for me, is where the possible limitations lie in stepwise DFA in this context, and justifying use of the alternative leave-one-out, cross-validation method instead. I’ve been told it would be better to not use stepwise but don’t understand how to place the explanations I’m coming across in the context of my research.

Hope everyone's staying healthy, safe and sane, and thank you for the help in advance!

TL;DR: how do I justify not using stepwise DFA?

jim mcnamara · Apr 19, 2020

Here is why/when you should consider stepwise DFA:

Code:

1. to maximally separate groups
2. to determine the most parsimonious way to discriminate - least comparisons, e.g.
3. ignore (not use) variables with little relationship to  discrimination.

James A. Holdnack, ... Grant L. Iverson, in WAIS-IV, WMS-IV, and ACS, 2013
Discriminant Function Analysis

Discriminant Function Analysis (DFA) has been used extensively in the past to derive optimal combinations of variables to differentiate groups because of its computational simplicity. However, DFA assumes that the predictors (i.e., tests included in the model) are each normally distributed and the set of predictors has a multivariate normal distribution along with homogeneous variance-covariance matrices

opinion only:
You want a binary split - two data "piles", but your dataset is small. The analysis can be overcompensated as the result of having a dataset with multiple good subpopulations. In other words, you create two piles when you really need more. So how would you test for (justify) this with your data? Did you see a multivariate normal distribtion for the set of predictors? ANOVA can be used to see what you get as a starter.

FWIW I have never used DFA on really small sample sizes from larger populations - but I worked in population biology. We always assumed, justifiably from previous work, our data was from multiple populations to start with.

Maybe @Stephen Tashi can help.

Cesca Roma · Apr 19, 2020

Hi Jim, thank you for taking a look at this

From what I understand of your feedback, over-complication and population specificity are both primary issues with stepwise DFA? I definitely agree, it's a lot for such little data but I feel for my project it's best to go through the motions of what I could do, if I had more impressive, population-wide data as you say!

So in differentiating stepwise from alternative DFA methods, would you say assumption of normal distribution is the biggest issue?

thanks again, and I hope Stephen might help shed some more light too! :)

jim mcnamara · Apr 19, 2020

Well, it fails to work correctly on non-normal samples. I do not know if SPSS barfs on data it cannot handle or not. You have to discriminate on integer values, floating point values also return an error. I think.

It does return a 39 for type I error on IBM -- too many non-rejects. In other words failure to discriminate. I'd be happy to be corrected on this assertion though.

Boy, I have not thought about this for eons... I hope I did not have RAM failure.

@Dale does any of this sound terrible?

Cesca Roma · Apr 19, 2020

Haha no I'm sure you didn't - I wish I understood it better to ask the right questions!

SPSS isn't necessarily returning an error for my data, but it's less than excellent as you'd imagine. Essentially, my supervisor is suggesting stepwise method would output better results that the 'enter independents together' DFA I used, but I think it would just make it more unreliable. Is that fair?

Dale · Apr 19, 2020

jim mcnamara said:

@Dale does any of this sound terrible?

Unfortunately I don’t know anything about DFA. I cannot give an informed opinion

jim mcnamara · Apr 19, 2020

Well, bottom line: go with what your advisor suggests. S/he has more knowledge of the data. Perhaps you can get a reason like 'the distribution has X feature (or deficit)...' so you can include that comment in your thesis.

You and I are doing this second-hand.

Cesca Roma · Apr 20, 2020

Sure, I agree, certainly the safest option at this point! Thank you again for taking the time to look over this!

Discriminant function analysis - stepwise or otherwise?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Undergrad How do E[X] and E[|X|] relate?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect