Algorithm for cluster finding (?)

  • Thread starter Thread starter Borek
  • Start date Start date
  • Tags Tags
    Algorithm
AI Thread Summary
The discussion revolves around identifying clusters of interlinked web pages, akin to analyzing social networks. Key concepts include the use of Connected Components theory and social network theory to find these clusters. Participants suggest employing clustering algorithms and analyzing the frequency of links, with one recommending the use of a Laplacian matrix to determine the number of connected components. Tools like NetworkX, a Python library for graph visualization, are mentioned for implementing these analyses. There is also a consideration of thresholds for defining connections between pages, indicating that mere connectivity does not necessarily imply clustering. Additional resources for social network analysis software are noted as helpful for further exploration.
Borek
Mentor
Messages
29,132
Reaction score
4,556
OK, I admit I have no idea what is the correct terminology, so could be thread title is completely off.

Imagine I have a graph of links between www pages. I suppose some of these pages are clustered - that is, they are heavily interlinked, but they don't link so extensively to other clusters/pages. How do I find them? This is not different from finding groups of friends in some population by analysing phone calls and so on.

I am more than sure that I have seen pictures showing results of such analysis (perhaps in Scientific American), but I can't remember any details, I am even not sure about correct terminology. Fact that I have probably read about the problem in Polish doesn't help either.

Names, links, keywords to look for will be appreciated.
 
Technology news on Phys.org
Borek said:
How do I find them?.
Connected Components may be the theory you're looking for. Another option is social network theory.

I had a similar problem, but my transitions could be read off a single file. I looked at transitions between clusters by building a table of first order markov probabilities, then displayed the graphs using networkx, which is a python toolkit for graph visualization, and it includes some connected components stuff. (I'll give you the code if you want.)

I suppose some of these pages are clustered - that is, they are heavily interlinked, but they don't link so extensively to other clusters/pages.
What about running a clustering algorithm on the links and looking at the frequency and rank fall off?
 
Last edited:
Take each page as a vertex in a giant graph where edges indicate links between sites. Then take the laplacian matrix. The number of connected components is the multiplicity of the eigenvalue 0. http://en.wikipedia.org/wiki/Laplacian_matrix
 
story645 said:
Connected Components may be the theory you're looking for. Another option is social network theory.

Social network page gave me some ideas to think about. Trick is, I am not sure what I want to do :wink: I have just some vague ideas that I want to test.

I had a similar problem, but my transitions could be read off a single file.

I can prepare such file. In fact I am going to try to do it and to feed it to some of the free programs listed on the wiki social network analysis software page.

oab729 said:
Take each page as a vertex in a giant graph where edges indicate links between sites. Then take the laplacian matrix. The number of connected components is the multiplicity of the eigenvalue 0.

I am not sure how it is going to help - just because pages are connected doesn't mean they are in the cluster, just because they are not connected doesn't mean they are not in the cluster.
 
Hmm... perhaps I misunderstand your question? John knows Alice. Alice Knows Alex. John Doesn't know Alice. The number of connected components is 1. Or do you mean John needs to really know Alice, so could you set thresholds or something? You could also look up clustering analysis or Partition decoupling method, if that's what you're thinking...
 
Thread 'Is this public key encryption?'
I've tried to intuit public key encryption but never quite managed. But this seems to wrap it up in a bow. This seems to be a very elegant way of transmitting a message publicly that only the sender and receiver can decipher. Is this how PKE works? No, it cant be. In the above case, the requester knows the target's "secret" key - because they have his ID, and therefore knows his birthdate.
I tried a web search "the loss of programming ", and found an article saying that all aspects of writing, developing, and testing software programs will one day all be handled through artificial intelligence. One must wonder then, who is responsible. WHO is responsible for any problems, bugs, deficiencies, or whatever malfunctions which the programs make their users endure? Things may work wrong however the "wrong" happens. AI needs to fix the problems for the users. Any way to...
Back
Top