Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Hierarchical Clustering: Ward linkage

  1. Oct 28, 2014 #1
    Hi there!

    The Ward linkage method in agglomerative hierarchical clustering computes the distance between two clusters using the within-group variance, which results in a weighted squared distance between cluster centers. Therefore, Ward linkage method doesn't rely on the distances of single elements, so it should be independent on the metric (euclidean , manhattan, squared euclidean...) used to compute the distance among the elements, because in the end the linkage criterion is based on the variance of the clusters which has a definite formula independently from the chosen metric. Nonetheless if I try the hclust function in R I obtain different results depending on the distance metric among the elements. Why?
    Thank you
  2. jcsd
  3. Oct 29, 2014 #2

    Stephen Tashi

    User Avatar
    Science Advisor

    Why do you think the variance of the clusters is independent of the metric? What formula are you talking about?

    The variance of a random variable representing a distance won't remain the same number if you change the units of measure. A quantity such as the "z-score" of a random variable representing a distance would remain numerically the same if the units of distance are changed, but it wouldn't necessarily remain the same if you switch from using euclidean distance to Manhattan distance.

    I notice the Wikipedia article http://en.wikipedia.org/wiki/Ward's_method has a caution about using the correct arguments in the R programming language.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook