Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Algorithm for cluster finding (?)

  1. Jan 21, 2010 #1

    Borek

    User Avatar

    Staff: Mentor

    OK, I admit I have no idea what is the correct terminology, so could be thread title is completely off.

    Imagine I have a graph of links between www pages. I suppose some of these pages are clustered - that is, they are heavily interlinked, but they don't link so extensively to other clusters/pages. How do I find them? This is not different from finding groups of friends in some population by analysing phone calls and so on.

    I am more than sure that I have seen pictures showing results of such analysis (perhaps in Scientific American), but I can't remember any details, I am even not sure about correct terminology. Fact that I have probably read about the problem in Polish doesn't help either.

    Names, links, keywords to look for will be appreciated.
     
  2. jcsd
  3. Jan 21, 2010 #2
    Connected Components may be the theory you're looking for. Another option is social network theory.

    I had a similar problem, but my transitions could be read off a single file. I looked at transitions between clusters by building a table of first order markov probabilities, then displayed the graphs using networkx, which is a python toolkit for graph visualization, and it includes some connected components stuff. (I'll give you the code if you want.)

    What about running a clustering algorithm on the links and looking at the frequency and rank fall off?
     
    Last edited: Jan 21, 2010
  4. Jan 23, 2010 #3
    Take each page as a vertex in a giant graph where edges indicate links between sites. Then take the laplacian matrix. The number of connected components is the multiplicity of the eigenvalue 0. http://en.wikipedia.org/wiki/Laplacian_matrix
     
  5. Jan 23, 2010 #4

    Borek

    User Avatar

    Staff: Mentor

    Social network page gave me some ideas to think about. Trick is, I am not sure what I want to do :wink: I have just some vague ideas that I want to test.

    I can prepare such file. In fact I am going to try to do it and to feed it to some of the free programs listed on the wiki social network analysis software page.

    I am not sure how it is going to help - just because pages are connected doesn't mean they are in the cluster, just because they are not connected doesn't mean they are not in the cluster.
     
  6. Jan 23, 2010 #5
    Hmm... perhaps I misunderstand your question? John knows Alice. Alice Knows Alex. John Doesn't know Alice. The number of connected components is 1. Or do you mean John needs to really know Alice, so could you set thresholds or something? You could also look up clustering analysis or Partition decoupling method, if that's what you're thinking...
     
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook