Hierarchical Clustering: Ward linkage

  • Context: Graduate 
  • Thread starter Thread starter eoghan
  • Start date Start date
  • Tags Tags
    Linkage
Click For Summary
SUMMARY

The discussion focuses on the Ward linkage method in agglomerative hierarchical clustering, specifically its computation of distance between clusters using within-group variance. It is established that Ward linkage is independent of the distance metric used (e.g., Euclidean, Manhattan) due to its reliance on cluster variance. However, users report differing results when applying the hclust function in R, indicating that the choice of distance metric can still influence outcomes. The importance of using correct arguments in R is highlighted, referencing a caution from the Wikipedia article on Ward's method.

PREREQUISITES
  • Understanding of agglomerative hierarchical clustering
  • Familiarity with the Ward linkage method
  • Knowledge of distance metrics (Euclidean, Manhattan)
  • Experience with the hclust function in R
NEXT STEPS
  • Research the mathematical formulation of Ward's linkage method
  • Learn about the implications of different distance metrics in clustering
  • Explore the hclust function in R and its parameters
  • Review the Wikipedia article on Ward's method for best practices
USEFUL FOR

Data scientists, statisticians, and machine learning practitioners interested in hierarchical clustering techniques and their implementation in R.

eoghan
Messages
201
Reaction score
7
Hi there!

The Ward linkage method in agglomerative hierarchical clustering computes the distance between two clusters using the within-group variance, which results in a weighted squared distance between cluster centers. Therefore, Ward linkage method doesn't rely on the distances of single elements, so it should be independent on the metric (euclidean , manhattan, squared euclidean...) used to compute the distance among the elements, because in the end the linkage criterion is based on the variance of the clusters which has a definite formula independently from the chosen metric. Nonetheless if I try the hclust function in R I obtain different results depending on the distance metric among the elements. Why?
Thank you
 
Physics news on Phys.org
eoghan said:
the variance of the clusters which has a definite formula independently from the chosen metric.

Why do you think the variance of the clusters is independent of the metric? What formula are you talking about?

The variance of a random variable representing a distance won't remain the same number if you change the units of measure. A quantity such as the "z-score" of a random variable representing a distance would remain numerically the same if the units of distance are changed, but it wouldn't necessarily remain the same if you switch from using euclidean distance to Manhattan distance.

I notice the Wikipedia article http://en.wikipedia.org/wiki/Ward's_method has a caution about using the correct arguments in the R programming language.
 

Similar threads

  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 64 ·
3
Replies
64
Views
4K
  • · Replies 11 ·
Replies
11
Views
5K
  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 2 ·
Replies
2
Views
3K
  • Poll Poll
  • · Replies 2 ·
Replies
2
Views
7K