Algorithm for cluster finding (?)

Borek · Jan 21, 2010

OK, I admit I have no idea what is the correct terminology, so could be thread title is completely off.

Imagine I have a graph of links between www pages. I suppose some of these pages are clustered - that is, they are heavily interlinked, but they don't link so extensively to other clusters/pages. How do I find them? This is not different from finding groups of friends in some population by analysing phone calls and so on.

I am more than sure that I have seen pictures showing results of such analysis (perhaps in Scientific American), but I can't remember any details, I am even not sure about correct terminology. Fact that I have probably read about the problem in Polish doesn't help either.

Names, links, keywords to look for will be appreciated.

story645 · Jan 21, 2010

Borek said:

How do I find them?.

Connected Components may be the theory you're looking for. Another option is social network theory.

I had a similar problem, but my transitions could be read off a single file. I looked at transitions between clusters by building a table of first order markov probabilities, then displayed the graphs using networkx, which is a python toolkit for graph visualization, and it includes some connected components stuff. (I'll give you the code if you want.)

I suppose some of these pages are clustered - that is, they are heavily interlinked, but they don't link so extensively to other clusters/pages.

What about running a clustering algorithm on the links and looking at the frequency and rank fall off?

oab729 · Jan 23, 2010

Take each page as a vertex in a giant graph where edges indicate links between sites. Then take the laplacian matrix. The number of connected components is the multiplicity of the eigenvalue 0. http://en.wikipedia.org/wiki/Laplacian_matrix

Borek · Jan 23, 2010

story645 said:

Connected Components may be the theory you're looking for. Another option is social network theory.

Social network page gave me some ideas to think about. Trick is, I am not sure what I want to do

I have just some vague ideas that I want to test.

I had a similar problem, but my transitions could be read off a single file.

I can prepare such file. In fact I am going to try to do it and to feed it to some of the free programs listed on the wiki social network analysis software page.

oab729 said:

Take each page as a vertex in a giant graph where edges indicate links between sites. Then take the laplacian matrix. The number of connected components is the multiplicity of the eigenvalue 0.

I am not sure how it is going to help - just because pages are connected doesn't mean they are in the cluster, just because they are not connected doesn't mean they are not in the cluster.

oab729 · Jan 23, 2010

Hmm... perhaps I misunderstand your question? John knows Alice. Alice Knows Alex. John Doesn't know Alice. The number of connected components is 1. Or do you mean John needs to really know Alice, so could you set thresholds or something? You could also look up clustering analysis or Partition decoupling method, if that's what you're thinking...

Algorithm for cluster finding (?)

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Use of AI (ML/DL) in Science

Other than just FizzBuzz to test programmer candidates

Sweetspot of data compression

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect