Distances in Knn: Are they metric functions in the Mathematical sense?

WWGD · Oct 28, 2021

Hi,
Just curious as to whether distances 'd' , used in Knn ; K nearest neighbors, in Machine Learning, are required to be metrics in the Mathematical Sense, i.e., if they are required to satisfy, in a space A:
##d: A \times A \rightarrow \mathbb R^{+} \cup \{0\} ;
d(a,a)=0 ;
d(a,b)=d(b,a) ; d(a,c) < d(a,b)+d(b,c) ##
?

mathman · Oct 28, 2021

Suggestion: Expand on what you are talking about.

WWGD · Oct 28, 2021

mathman said:

Suggestion: Expand on what you are talking about.

Sure. We're trying to ; a binary classification 0/1 , of test subjects with the following method.
I am simplifying somewhat

Basically an object belongs to the class of objects it is closest to, by some fixed criterion d.

0) Choose a positive integer k
1) We have objects known to belong to both class 0 and class 1
2) We have a collection of objects to test on whether they belong to class 0 or class 1.
3)We have a real -valued non-negative binary function d AxB --> R ; A is in 2), B is in 1)
4) We find the value of d(a,B) , i.e., { d(a,b) : a fixed, for all b in B}
5) a is assigned to the class of the object b for which d(a,b) is the smallest.

I suspect @pbuk, @Office_Shredder or @Dale may know.

pbuk · Oct 28, 2021

WWGD said:

... ##d(a,c) < d(a,b)+d(b,c) ##

I think you mean ## d(a,c) \le d(a,b)+d(b,c) ##.

I am not sure we can say that the distance functions are required to satisfy any conditions, however we are only interested in functions that perform well and it may be possible to show that certain conditions must be satisfied in order for this to be the case.

Many metrics in ML do satisfy the triangle equality, however I don't think this is true for the cosine distance and I have a hunch this is also the case for some of the distances in boolean spaces e.g. Kuslinski distance.

Taking the other contemplated 'requirements':

## d: A \times A \rightarrow \mathbb R^{+} \cup \{0\} ; ## clearly not the case for boolean metrics.
## d(a,a)=0 ; ## I don't see this as necessary: Euclidian distance + 1 would work almost as well as Euclidean distance in most cases.
## d(a,b)=d(b,a)## I'll leave a counterexample for this to you

jedishrfu · Oct 28, 2021

KNNs can use a variety of distance functions depending on the needs of the project. Some popular ones are:
- Pythagorean distance
- taxicab distance aka Manhattan distance
- minkowski distance

https://www.kdnuggets.com/2020/11/most-popular-distance-metrics-knn.html

pbuk · Oct 29, 2021

jedishrfu said:

KNNs can use a variety of distance functions depending on the needs of the project. Some popular ones are:
- Pythagorean distance
- taxicab distance aka Manhattan distance
- minkowski distance

https://www.kdnuggets.com/2020/11/most-popular-distance-metrics-knn.html

Those three do satisfy @WWGD's (corrected) criteria for a metric of course.

Office_Shredder · Oct 29, 2021

I agree with the clever observation that euclidean distance +1 results in the exact same algorithm but does not satisfy the triangle inequality.

Cosine similarity does not satisfy the triangle inequality either, but does have the property that if A and B are similar, then the similarity of A and C is about the same as the similarity of B and C. Otherwise you can't really get clusters of points that are similar to each other and dissimilar from the other clusters.

Distances in Knn: Are they metric functions in the Mathematical sense?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad A variant of the Monty Hall problem

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight