What Are the Limitations of KNN with High Dimensionality in LDA/QDA Performance?

brojesus111 · Oct 22, 2013

Homework Statement

1. We have a set of observations on p = 100 features. The observations are uniformly distributed on each feature, and each feature ranges in value from 0 to 1. We wish to predict a test observation’s response using observations within the 10 % of each feature’s range that is closest to that test observation. What fraction of the available observations will we use to make the prediction?

2. Now argue based on the above that a drawback of KNN when p is large is that there are very few training observations near any given test observation.
LDA/QDA

3. If Bayes decision boundary is linear, do we expect LDA or QDA to perform better on the training set? On the test set?

4. If the Bayes decision boundary is nonlinear, do we expect LDA or QDA to perform better on the training set? On the test set?

The Attempt at a Solution

1. Is it just .1^100?

2. For example, if we need to use observations within 99% of the feature's range, then we would use .99100 = .366 to make a prediction, which is roughly 37% of available observations. That tells us that for the 1% left, there's still a lot of space left so observations are far away from any test observation.

3. For the test set, since we know it is linear, then the LDA should do better. For the training set, I think QDA since it is more complex and more flexible.

4. Since it is nonlinear, the QDA should do better on the test set since we know it is nonlinear. On the training set, the QDA is more flexible than the LDA so it should do better.

Also, out of curiosity, if we have K=1 for the KNN, is our training error rate 0 or close to 0?

brojesus111 · Oct 22, 2013

Actually, the only difference between LDA and QDA is that QDA assumes that each class has its own covariance matrix. So does it actually matter whether it's the test or training data? The LDA should do better when it's linear and the QDA should do better when it's non-linear.

Is this correct?

brojesus111 · Oct 23, 2013

More reasons why I think my revised post above is right. The LDA's boundary is linear and it has a lower variance while the QDA's boundary is non-linear and it has a higher variance.

What Are the Limitations of KNN with High Dimensionality in LDA/QDA Performance?

SUMMARY

PREREQUISITES

NEXT STEPS

USEFUL FOR

Homework Statement

The Attempt at a Solution

Similar threads

Hi! Can someone explain about Differential Equations?

Polar integral

Deriving spatial derivatives

Is this the correct general solution of the given PDE?

What does "compute Aut(G)" mean?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect