Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Could someone explain to me how this clustering algorithm works?

  1. Dec 15, 2012 #1
    So MathWorks.com shows this as an example:

    d = pdist(meas);
    Z = linkage(d);
    c = cluster(Z,'maxclust',3:5);


    http://www.mathworks.com/help/stats/cluster.html.

    I'm confused about why the routine gives any useful information. First it returns the Euclidean distances between values in some array meas. Then it performs hierarchal clustering on those distances. How is that useful? If I had a vector (0, 1, 50, 99, 100), then the distances are |0-1|=1, |0-50|=50, |0-99|=99, |0-100|=100, |1-50|=49, |1-99|=98, |1-100| = 99, |50-99|=49, |50-100|=50, |99-100| = 1. So I'm then clustering the values 1, 50, 99, 100, 49, 98, 99, 49, 50, 1. If I tell it to form a max of 3 clusters, the clusters will probably be (1,1), (49, 49, 50), and (98, 99, 99, 100). The first cluster is corresponding to the distances between 0 and 1, and between 99, 100. So that means I'm clustering the values 0, 1, 99, and 100 together.

    Or am I totally not understanding this?
     
  2. jcsd
  3. Dec 22, 2012 #2

    chiro

    User Avatar
    Science Advisor

    Hey Jamin2112.

    The name of the game is classification.

    If you get a good fit for hierarchical classifications then this can help you in identifying a possible model where some data points are best represented in a hierarchical manner.

    Basically each kind of classification has its advantages and dis-advantages depending on how it actually classifies the data.

    Each algorithm will classify the data that it "expects" to have well and do a bad job if the data isn't as "expected".

    Also hierarchical classification is a natural way to classify general data currently and its a lot easier to understand than some other techniques which is why its often done.

    If you get a technique that classifies something with an underlying idea that is too complex, then it may become useless to use for all practical purposes.

    Classification techniques don't have to be exact but rather, they just have to be "good enough" in many applications (although some require better than "good enough").
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook