Algorithm for cluster finding (?)

  • Thread starter Borek
  • Start date
  • Tags
    Algorithm
In summary, the conversation discusses finding clusters of heavily interlinked web pages and how to identify them using various theories and algorithms such as connected components and social network theory. The speaker also mentions using a clustering algorithm and laplacian matrix to analyze the links between pages. There is also a suggestion to look into clustering analysis and the Partition decoupling method.
  • #1
Borek
Mentor
28,950
4,245
OK, I admit I have no idea what is the correct terminology, so could be thread title is completely off.

Imagine I have a graph of links between www pages. I suppose some of these pages are clustered - that is, they are heavily interlinked, but they don't link so extensively to other clusters/pages. How do I find them? This is not different from finding groups of friends in some population by analysing phone calls and so on.

I am more than sure that I have seen pictures showing results of such analysis (perhaps in Scientific American), but I can't remember any details, I am even not sure about correct terminology. Fact that I have probably read about the problem in Polish doesn't help either.

Names, links, keywords to look for will be appreciated.
 
Technology news on Phys.org
  • #2
Borek said:
How do I find them?.
Connected Components may be the theory you're looking for. Another option is social network theory.

I had a similar problem, but my transitions could be read off a single file. I looked at transitions between clusters by building a table of first order markov probabilities, then displayed the graphs using networkx, which is a python toolkit for graph visualization, and it includes some connected components stuff. (I'll give you the code if you want.)

I suppose some of these pages are clustered - that is, they are heavily interlinked, but they don't link so extensively to other clusters/pages.
What about running a clustering algorithm on the links and looking at the frequency and rank fall off?
 
Last edited:
  • #3
Take each page as a vertex in a giant graph where edges indicate links between sites. Then take the laplacian matrix. The number of connected components is the multiplicity of the eigenvalue 0. http://en.wikipedia.org/wiki/Laplacian_matrix
 
  • #4
story645 said:
Connected Components may be the theory you're looking for. Another option is social network theory.

Social network page gave me some ideas to think about. Trick is, I am not sure what I want to do :wink: I have just some vague ideas that I want to test.

I had a similar problem, but my transitions could be read off a single file.

I can prepare such file. In fact I am going to try to do it and to feed it to some of the free programs listed on the wiki social network analysis software page.

oab729 said:
Take each page as a vertex in a giant graph where edges indicate links between sites. Then take the laplacian matrix. The number of connected components is the multiplicity of the eigenvalue 0.

I am not sure how it is going to help - just because pages are connected doesn't mean they are in the cluster, just because they are not connected doesn't mean they are not in the cluster.
 
  • #5
Hmm... perhaps I misunderstand your question? John knows Alice. Alice Knows Alex. John Doesn't know Alice. The number of connected components is 1. Or do you mean John needs to really know Alice, so could you set thresholds or something? You could also look up clustering analysis or Partition decoupling method, if that's what you're thinking...
 

1. What is an algorithm for cluster finding?

An algorithm for cluster finding is a step-by-step process or set of rules used to identify clusters or groups of data points within a larger dataset. These clusters can help to reveal patterns and trends within the data.

2. How does an algorithm for cluster finding work?

An algorithm for cluster finding typically involves calculating distances or similarities between data points and using these calculations to group similar points together into clusters. This process is repeated until all data points have been assigned to a cluster and the most distinct clusters have been identified.

3. What are the benefits of using an algorithm for cluster finding?

Using an algorithm for cluster finding can help to simplify and organize large datasets, making it easier to identify patterns and relationships within the data. This can be useful in various fields such as data analysis, machine learning, and social sciences.

4. What are some common types of algorithms for cluster finding?

Some common types of algorithms for cluster finding include k-means clustering, hierarchical clustering, and density-based clustering. Each type of algorithm has its own approach to identifying clusters and may be more suitable for different types of datasets.

5. How accurate are algorithms for cluster finding?

The accuracy of an algorithm for cluster finding depends on various factors such as the quality of the data, the chosen algorithm, and the parameters used. In general, these algorithms can provide a good starting point for identifying clusters, but may require further refinement and validation by a human expert.

Similar threads

  • Programming and Computer Science
Replies
6
Views
972
  • Programming and Computer Science
Replies
8
Views
1K
Replies
9
Views
1K
  • Programming and Computer Science
Replies
1
Views
769
  • Programming and Computer Science
2
Replies
46
Views
4K
  • Programming and Computer Science
Replies
2
Views
626
  • Programming and Computer Science
Replies
14
Views
2K
  • Programming and Computer Science
Replies
30
Views
4K
  • Programming and Computer Science
Replies
1
Views
678
  • Programming and Computer Science
Replies
13
Views
3K
Back
Top