Distances in Knn: Are they metric functions in the Mathematical sense?

In summary, the distances in Knn ; K nearest neighbors, in Machine Learning, are not required to be metrics in the mathematical sense.f
  • #1

WWGD

Science Advisor
Gold Member
6,481
9,261
TL;DR Summary
The term 'distance' is used in Knn. Are these distance functions required to satisfy the properties of a metric?
Hi,
Just curious as to whether distances 'd' , used in Knn ; K nearest neighbors, in Machine Learning, are required to be metrics in the Mathematical Sense, i.e., if they are required to satisfy, in a space A:
##d: A \times A \rightarrow \mathbb R^{+} \cup \{0\} ;
d(a,a)=0 ;
d(a,b)=d(b,a) ; d(a,c) < d(a,b)+d(b,c) ##
?
 
  • #2
Suggestion: Expand on what you are talking about.
 
  • #3
Suggestion: Expand on what you are talking about.
Sure. We're trying to ; a binary classification 0/1 , of test subjects with the following method.
I am simplifying somewhat

Basically an object belongs to the class of objects it is closest to, by some fixed criterion d.

0) Choose a positive integer k
1) We have objects known to belong to both class 0 and class 1
2) We have a collection of objects to test on whether they belong to class 0 or class 1.
3)We have a real -valued non-negative binary function d AxB --> R ; A is in 2), B is in 1)
4) We find the value of d(a,B) , i.e., { d(a,b) : a fixed, for all b in B}
5) a is assigned to the class of the object b for which d(a,b) is the smallest.

I suspect @pbuk, @Office_Shredder or @Dale may know.
 
Last edited:
  • #4
... ##d(a,c) < d(a,b)+d(b,c) ##
I think you mean ## d(a,c) \le d(a,b)+d(b,c) ##.

I am not sure we can say that the distance functions are required to satisfy any conditions, however we are only interested in functions that perform well and it may be possible to show that certain conditions must be satisfied in order for this to be the case.

Many metrics in ML do satisfy the triangle equality, however I don't think this is true for the cosine distance and I have a hunch this is also the case for some of the distances in boolean spaces e.g. Kuslinski distance.

Taking the other contemplated 'requirements':
  • ## d: A \times A \rightarrow \mathbb R^{+} \cup \{0\} ; ## clearly not the case for boolean metrics.
  • ## d(a,a)=0 ; ## I don't see this as necessary: Euclidian distance + 1 would work almost as well as Euclidean distance in most cases.
  • ## d(a,b)=d(b,a)## I'll leave a counterexample for this to you :-p
 
  • #7
I agree with the clever observation that euclidean distance +1 results in the exact same algorithm but does not satisfy the triangle inequality.

Cosine similarity does not satisfy the triangle inequality either, but does have the property that if A and B are similar, then the similarity of A and C is about the same as the similarity of B and C. Otherwise you can't really get clusters of points that are similar to each other and dissimilar from the other clusters.
 
Last edited:
  • Like
Likes pbuk and WWGD

Suggested for: Distances in Knn: Are they metric functions in the Mathematical sense?

Replies
13
Views
1K
Replies
8
Views
718
Replies
2
Views
646
Replies
1
Views
2K
Replies
19
Views
2K
Replies
1
Views
825
Replies
1
Views
735
Back
Top