Algorithm for cluster finding (?)

  • Thread starter Thread starter Borek
  • Start date Start date
  • Tags Tags
    Algorithm
AI Thread Summary
The discussion revolves around identifying clusters of interlinked web pages, akin to analyzing social networks. Key concepts include the use of Connected Components theory and social network theory to find these clusters. Participants suggest employing clustering algorithms and analyzing the frequency of links, with one recommending the use of a Laplacian matrix to determine the number of connected components. Tools like NetworkX, a Python library for graph visualization, are mentioned for implementing these analyses. There is also a consideration of thresholds for defining connections between pages, indicating that mere connectivity does not necessarily imply clustering. Additional resources for social network analysis software are noted as helpful for further exploration.
Borek
Mentor
Messages
29,126
Reaction score
4,547
OK, I admit I have no idea what is the correct terminology, so could be thread title is completely off.

Imagine I have a graph of links between www pages. I suppose some of these pages are clustered - that is, they are heavily interlinked, but they don't link so extensively to other clusters/pages. How do I find them? This is not different from finding groups of friends in some population by analysing phone calls and so on.

I am more than sure that I have seen pictures showing results of such analysis (perhaps in Scientific American), but I can't remember any details, I am even not sure about correct terminology. Fact that I have probably read about the problem in Polish doesn't help either.

Names, links, keywords to look for will be appreciated.
 
Technology news on Phys.org
Borek said:
How do I find them?.
Connected Components may be the theory you're looking for. Another option is social network theory.

I had a similar problem, but my transitions could be read off a single file. I looked at transitions between clusters by building a table of first order markov probabilities, then displayed the graphs using networkx, which is a python toolkit for graph visualization, and it includes some connected components stuff. (I'll give you the code if you want.)

I suppose some of these pages are clustered - that is, they are heavily interlinked, but they don't link so extensively to other clusters/pages.
What about running a clustering algorithm on the links and looking at the frequency and rank fall off?
 
Last edited:
Take each page as a vertex in a giant graph where edges indicate links between sites. Then take the laplacian matrix. The number of connected components is the multiplicity of the eigenvalue 0. http://en.wikipedia.org/wiki/Laplacian_matrix
 
story645 said:
Connected Components may be the theory you're looking for. Another option is social network theory.

Social network page gave me some ideas to think about. Trick is, I am not sure what I want to do :wink: I have just some vague ideas that I want to test.

I had a similar problem, but my transitions could be read off a single file.

I can prepare such file. In fact I am going to try to do it and to feed it to some of the free programs listed on the wiki social network analysis software page.

oab729 said:
Take each page as a vertex in a giant graph where edges indicate links between sites. Then take the laplacian matrix. The number of connected components is the multiplicity of the eigenvalue 0.

I am not sure how it is going to help - just because pages are connected doesn't mean they are in the cluster, just because they are not connected doesn't mean they are not in the cluster.
 
Hmm... perhaps I misunderstand your question? John knows Alice. Alice Knows Alex. John Doesn't know Alice. The number of connected components is 1. Or do you mean John needs to really know Alice, so could you set thresholds or something? You could also look up clustering analysis or Partition decoupling method, if that's what you're thinking...
 
Dear Peeps I have posted a few questions about programing on this sectio of the PF forum. I want to ask you veterans how you folks learn program in assembly and about computer architecture for the x86 family. In addition to finish learning C, I am also reading the book From bits to Gates to C and Beyond. In the book, it uses the mini LC3 assembly language. I also have books on assembly programming and computer architecture. The few famous ones i have are Computer Organization and...
I had a Microsoft Technical interview this past Friday, the question I was asked was this : How do you find the middle value for a dataset that is too big to fit in RAM? I was not able to figure this out during the interview, but I have been look in this all weekend and I read something online that said it can be done at O(N) using something called the counting sort histogram algorithm ( I did not learn that in my advanced data structures and algorithms class). I have watched some youtube...
Back
Top