Could someone explain to me how this clustering algorithm works?

Jamin2112 · Dec 15, 2012

So MathWorks.com shows this as an example:

d = pdist(meas);
Z = linkage(d);
c = cluster(Z,'maxclust',3:5);

http://www.mathworks.com/help/stats/cluster.html.

I'm confused about why the routine gives any useful information. First it returns the Euclidean distances between values in some array meas. Then it performs hierarchal clustering on those distances. How is that useful? If I had a vector (0, 1, 50, 99, 100), then the distances are |0-1|=1, |0-50|=50, |0-99|=99, |0-100|=100, |1-50|=49, |1-99|=98, |1-100| = 99, |50-99|=49, |50-100|=50, |99-100| = 1. So I'm then clustering the values 1, 50, 99, 100, 49, 98, 99, 49, 50, 1. If I tell it to form a max of 3 clusters, the clusters will probably be (1,1), (49, 49, 50), and (98, 99, 99, 100). The first cluster is corresponding to the distances between 0 and 1, and between 99, 100. So that means I'm clustering the values 0, 1, 99, and 100 together.

Or am I totally not understanding this?

chiro · Dec 22, 2012

Hey Jamin2112.

The name of the game is classification.

If you get a good fit for hierarchical classifications then this can help you in identifying a possible model where some data points are best represented in a hierarchical manner.

Basically each kind of classification has its advantages and dis-advantages depending on how it actually classifies the data.

Each algorithm will classify the data that it "expects" to have well and do a bad job if the data isn't as "expected".

Also hierarchical classification is a natural way to classify general data currently and its a lot easier to understand than some other techniques which is why its often done.

If you get a technique that classifies something with an underlying idea that is too complex, then it may become useless to use for all practical purposes.

Classification techniques don't have to be exact but rather, they just have to be "good enough" in many applications (although some require better than "good enough").

Could someone explain to me how this clustering algorithm works?

1. What is clustering and why is it used?

2. How does a clustering algorithm work?

3. What are some common types of clustering algorithms?

4. How do you determine the number of clusters in a dataset?

5. What are some applications of clustering in real-world scenarios?

Similar threads

Hot Threads

Recent Insights