What Are the Limitations of KNN with High Dimensionality in LDA/QDA Performance?

  • Thread starter Thread starter brojesus111
  • Start date Start date
  • Tags Tags
    performance
Click For Summary
SUMMARY

The discussion focuses on the limitations of K-Nearest Neighbors (KNN) in high-dimensional spaces, particularly in relation to Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA). With 100 features, the fraction of training observations available for prediction diminishes significantly, leading to sparse data points near any test observation. LDA performs better on linear decision boundaries, while QDA excels in non-linear scenarios due to its flexibility and distinct covariance assumptions for each class.

PREREQUISITES
  • Understanding of K-Nearest Neighbors (KNN) algorithm
  • Familiarity with Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA)
  • Knowledge of high-dimensional data challenges
  • Basic concepts of Bayes decision boundaries
NEXT STEPS
  • Explore the mathematical foundations of K-Nearest Neighbors (KNN) in high-dimensional spaces
  • Study the differences between Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA)
  • Investigate dimensionality reduction techniques to improve KNN performance
  • Learn about the implications of covariance matrices in QDA
USEFUL FOR

Data scientists, machine learning practitioners, and statisticians interested in classification techniques and the effects of high dimensionality on model performance.

brojesus111
Messages
38
Reaction score
0

Homework Statement



1. We have a set of observations on p = 100 features. The observations are uniformly distributed on each feature, and each feature ranges in value from 0 to 1. We wish to predict a test observation’s response using observations within the 10 % of each feature’s range that is closest to that test observation. What fraction of the available observations will we use to make the prediction?

2. Now argue based on the above that a drawback of KNN when p is large is that there are very few training observations near any given test observation.
LDA/QDA

3. If Bayes decision boundary is linear, do we expect LDA or QDA to perform better on the training set? On the test set?

4. If the Bayes decision boundary is nonlinear, do we expect LDA or QDA to perform better on the training set? On the test set?

The Attempt at a Solution



1. Is it just .1^100?

2. For example, if we need to use observations within 99% of the feature's range, then we would use .99100 = .366 to make a prediction, which is roughly 37% of available observations. That tells us that for the 1% left, there's still a lot of space left so observations are far away from any test observation.

3. For the test set, since we know it is linear, then the LDA should do better. For the training set, I think QDA since it is more complex and more flexible.

4. Since it is nonlinear, the QDA should do better on the test set since we know it is nonlinear. On the training set, the QDA is more flexible than the LDA so it should do better.

Also, out of curiosity, if we have K=1 for the KNN, is our training error rate 0 or close to 0?
 
Physics news on Phys.org
Actually, the only difference between LDA and QDA is that QDA assumes that each class has its own covariance matrix. So does it actually matter whether it's the test or training data? The LDA should do better when it's linear and the QDA should do better when it's non-linear.

Is this correct?
 
More reasons why I think my revised post above is right. The LDA's boundary is linear and it has a lower variance while the QDA's boundary is non-linear and it has a higher variance.
 

Similar threads

Replies
5
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 28 ·
Replies
28
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 29 ·
Replies
29
Views
3K
Replies
1
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 28 ·
Replies
28
Views
3K
  • · Replies 51 ·
2
Replies
51
Views
5K