Does Average Linkage satisfy the properties of metric space?

getUsername
Messages
3
Reaction score
0

Homework Statement



A dissimilarity measure d(x, y) for two data points x and y typically satisfy the following three properties:

1. d(x, y) ≥ 0 and d(x, y) = 0 if and only if x = y
2. d(x, y) = d(y, x)
3. d(x,z) ≤ d(x, y) + d(y,z)The following method has been proposed for measuring the dissimilarity[read: distance] between two sets of data points A = {xa1, xa2, . . . , xam} and B = {xb1, xb2, . . . , xbn}Average Linkage(UPGMA):

d(A,B) = {\frac{Ʃ_{x∈A} Ʃ_{y∈B} d(x, y)}{|A||B|}}Show that the linkage method satisfies each of the properties or provide a counter example (a visual/diagram representation of any counter example is sufficient if appropriate).
2. Specific Problem
I am having difficulty proving that the Average Linkage method does/does not satisfy the third property of metric space, the triangle inequality.

The Euclidean distance formula, the Absolute distance formula(Manhattan) or any other distance formula may be used to show this.
 
Physics news on Phys.org
getUsername said:

Homework Statement



A dissimilarity measure d(x, y) for two data points x and y typically satisfy the following three properties:

1. d(x, y) ≥ 0 and d(x, y) = 0 if and only if x = y
2. d(x, y) = d(y, x)
3. d(x,z) ≤ d(x, y) + d(y,z)The following method has been proposed for measuring the dissimilarity[read: distance] between two sets of data points A = {xa1, xa2, . . . , xam} and B = {xb1, xb2, . . . , xbn}Average Linkage(UPGMA):

d(A,B) = {\frac{Ʃ_{x∈A} Ʃ_{y∈B} d(x, y)}{|A||B|}}Show that the linkage method satisfies each of the properties or provide a counter example (a visual/diagram representation of any counter example is sufficient if appropriate).
2. Specific Problem
I am having difficulty proving that the Average Linkage method does/does not satisfy the third property of metric space, the triangle inequality.

The Euclidean distance formula, the Absolute distance formula(Manhattan) or any other distance formula may be used to show this.

I don't think it even satisfies the first property that d(A,A)=0 if A has more than one different data point, does it?
 
Last edited:
Yes, I think you might be correct about property 1. I sort of glazed over it as I had already proven that property for Single Linkage and Complete Linkage. I didn't consider the case of d(A,A)=0 where A has more than one data point.
 
I think property one holds for average linkage.

e.g. A = { (0,1) (0,0) }

average d(A,A) = 0.5

Obviously this is greater than 0 in the case where x = y.

However the property states that d(x,y) can be 0 if and only if x = y.

This is still true because d(x,y) will only equal to 0 when x = y with the added constraint that x will only contain a single data point.

e.g. B = {(0,0)}

average d(B,B) = 0
 
There are two things I don't understand about this problem. First, when finding the nth root of a number, there should in theory be n solutions. However, the formula produces n+1 roots. Here is how. The first root is simply ##\left(r\right)^{\left(\frac{1}{n}\right)}##. Then you multiply this first root by n additional expressions given by the formula, as you go through k=0,1,...n-1. So you end up with n+1 roots, which cannot be correct. Let me illustrate what I mean. For this...
Back
Top