Does Average Linkage satisfy the properties of metric space?

Click For Summary

Homework Help Overview

The discussion revolves around the properties of a dissimilarity measure, specifically the Average Linkage method (UPGMA), and whether it satisfies the properties of a metric space. Participants are examining the conditions under which the method may or may not fulfill these properties, particularly focusing on the triangle inequality and the identity of indiscernibles.

Discussion Character

  • Exploratory, Assumption checking, Conceptual clarification

Approaches and Questions Raised

  • Participants are attempting to prove or disprove whether Average Linkage satisfies the triangle inequality and the first property of a metric space. There are discussions about using various distance formulas to support their arguments.

Discussion Status

Some participants express uncertainty regarding the validity of the first property, while others believe it holds under certain conditions. The conversation reflects a mix of perspectives on the properties of the Average Linkage method, with no clear consensus reached yet.

Contextual Notes

Participants are considering specific examples and counterexamples to illustrate their points, including cases where the sets contain multiple data points. There is an emphasis on the need for visual or diagrammatic representations to clarify arguments.

getUsername
Messages
3
Reaction score
0

Homework Statement



A dissimilarity measure d(x, y) for two data points x and y typically satisfy the following three properties:

1. [itex]d(x, y) ≥ 0[/itex] and [itex]d(x, y) = 0[/itex] if and only if [itex]x = y[/itex]
2. [itex]d(x, y) = d(y, x)[/itex]
3. [itex]d(x,z) ≤ d(x, y) + d(y,z)[/itex]The following method has been proposed for measuring the dissimilarity[read: distance] between two sets of data points A = {xa1, xa2, . . . , xam} and B = {xb1, xb2, . . . , xbn}Average Linkage(UPGMA):

[tex]d(A,B) = {\frac{Ʃ_{x∈A} Ʃ_{y∈B} d(x, y)}{|A||B|}}[/tex]Show that the linkage method satisfies each of the properties or provide a counter example (a visual/diagram representation of any counter example is sufficient if appropriate).
2. Specific Problem
I am having difficulty proving that the Average Linkage method does/does not satisfy the third property of metric space, the triangle inequality.

The Euclidean distance formula, the Absolute distance formula(Manhattan) or any other distance formula may be used to show this.
 
Physics news on Phys.org
getUsername said:

Homework Statement



A dissimilarity measure d(x, y) for two data points x and y typically satisfy the following three properties:

1. [itex]d(x, y) ≥ 0[/itex] and [itex]d(x, y) = 0[/itex] if and only if [itex]x = y[/itex]
2. [itex]d(x, y) = d(y, x)[/itex]
3. [itex]d(x,z) ≤ d(x, y) + d(y,z)[/itex]The following method has been proposed for measuring the dissimilarity[read: distance] between two sets of data points A = {xa1, xa2, . . . , xam} and B = {xb1, xb2, . . . , xbn}Average Linkage(UPGMA):

[tex]d(A,B) = {\frac{Ʃ_{x∈A} Ʃ_{y∈B} d(x, y)}{|A||B|}}[/tex]Show that the linkage method satisfies each of the properties or provide a counter example (a visual/diagram representation of any counter example is sufficient if appropriate).
2. Specific Problem
I am having difficulty proving that the Average Linkage method does/does not satisfy the third property of metric space, the triangle inequality.

The Euclidean distance formula, the Absolute distance formula(Manhattan) or any other distance formula may be used to show this.

I don't think it even satisfies the first property that d(A,A)=0 if A has more than one different data point, does it?
 
Last edited:
Yes, I think you might be correct about property 1. I sort of glazed over it as I had already proven that property for Single Linkage and Complete Linkage. I didn't consider the case of d(A,A)=0 where A has more than one data point.
 
I think property one holds for average linkage.

e.g. A = { (0,1) (0,0) }

average d(A,A) = 0.5

Obviously this is greater than 0 in the case where x = y.

However the property states that d(x,y) can be 0 if and only if x = y.

This is still true because d(x,y) will only equal to 0 when x = y with the added constraint that x will only contain a single data point.

e.g. B = {(0,0)}

average d(B,B) = 0
 

Similar threads

  • · Replies 19 ·
Replies
19
Views
3K
  • · Replies 15 ·
Replies
15
Views
3K
  • · Replies 17 ·
Replies
17
Views
2K
  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 0 ·
Replies
0
Views
2K
  • · Replies 22 ·
Replies
22
Views
3K
Replies
2
Views
2K