# Does Average Linkage satisfy the properties of metric space?

1. Mar 14, 2013

1. The problem statement, all variables and given/known data

A dissimilarity measure d(x, y) for two data points x and y typically satisfy the following three properties:

1. $d(x, y) ≥ 0$ and $d(x, y) = 0$ if and only if $x = y$
2. $d(x, y) = d(y, x)$
3. $d(x,z) ≤ d(x, y) + d(y,z)$

The following method has been proposed for measuring the dissimilarity[read: distance] between two sets of data points A = {xa1, xa2, . . . , xam} and B = {xb1, xb2, . . . , xbn}

$$d(A,B) = {\frac{Ʃ_{x∈A} Ʃ_{y∈B} d(x, y)}{|A||B|}}$$

Show that the linkage method satisﬁes each of the properties or provide a counter example (a visual/diagram representation of any counter example is suﬃcient if appropriate).

2. Specific Problem
I am having difficulty proving that the Average Linkage method does/does not satisfy the third property of metric space, the triangle inequality.

The Euclidean distance formula, the Absolute distance formula(Manhattan) or any other distance formula may be used to show this.

2. Mar 14, 2013

### Dick

I don't think it even satisfies the first property that d(A,A)=0 if A has more than one different data point, does it?

Last edited: Mar 14, 2013
3. Mar 14, 2013

Yes, I think you might be correct about property 1. I sort of glazed over it as I had already proven that property for Single Linkage and Complete Linkage. I didn't consider the case of d(A,A)=0 where A has more than one data point.

4. Mar 15, 2013

I think property one holds for average linkage.

e.g. A = { (0,1) (0,0) }

average d(A,A) = 0.5

Obviously this is greater than 0 in the case where x = y.

However the property states that d(x,y) can be 0 if and only if x = y.

This is still true because d(x,y) will only equal to 0 when x = y with the added constraint that x will only contain a single data point.

e.g. B = {(0,0)}

average d(B,B) = 0