1. We have a set of observations on p = 100 features. The observations are uniformly distributed on each feature, and each feature ranges in value from 0 to 1. We wish to predict a test observation’s response using observations within the 10 % of each feature’s range that is closest to that test observation. What fraction of the available observations will we use to make the prediction?
2. Now argue based on the above that a drawback of KNN when p is large is that there are very few training observations near any given test observation.
3. If Bayes decision boundary is linear, do we expect LDA or QDA to perform better on the training set? On the test set?
4. If the Bayes decision boundary is nonlinear, do we expect LDA or QDA to perform better on the training set? On the test set?
The Attempt at a Solution
1. Is it just .1^100?
2. For example, if we need to use observations within 99% of the feature's range, then we would use .99100 = .366 to make a prediction, which is roughly 37% of available observations. That tells us that for the 1% left, there's still a lot of space left so observations are far away from any test observation.
3. For the test set, since we know it is linear, then the LDA should do better. For the training set, I think QDA since it is more complex and more flexible.
4. Since it is nonlinear, the QDA should do better on the test set since we know it is nonlinear. On the training set, the QDA is more flexible than the LDA so it should do better.
Also, out of curiosity, if we have K=1 for the KNN, is our training error rate 0 or close to 0?