Undergrad Looking for the most suitable distance for binary clustering

Click For Summary
The discussion revolves around selecting the most appropriate distance measure for binary clustering of user login data in a pandas dataset. The user has employed Hierarchical clustering with Hamming distance but is uncertain about its effectiveness after reviewing a comparison of 76 distance measures. They seek recommendations on alternative distance metrics, considering the importance of both positive and negative matches in their analysis. The Sokal-Michener distance is suggested as a potential option. Ultimately, the choice of metric should align with the specific objectives of the clustering task.
Frank Einstein
Messages
166
Reaction score
1
TL;DR
I have a set of data of people loading into a server and I must find the most adequate distance to cluster them.
Hello everyone.

I have a pandas dataset in python which has n+1 columns and t rows. The first column is a timestamp that goes second by second during a time interval, and the other columns are the names of the people who log in the server. The t rows of the other columns indicate if the person is logged with an "1" and a "0" if the person isn't logged in the exact second.

I have used a Hierarchical clustering with Hamming distance and linkage average.

However, I am not sure if the Hamming distance is the most suitable measure to calculate the clustering between the users, specially after reading this article in which a comparison between 76 distances is defined.

I am not an expert in clustering, so I would like to know what other people think that would be the most adequate distance measure to group the users.

As far as I know, positive and negative matches are important in this case, so the Sokal Michenner distance might be suitable?

Any recomendation is welcome.
Best regards an thanks for reading.
 
Physics news on Phys.org
I think it would help to start by explaining why you are clustering users. A metric's suitability is defined by what your end objective is.
 
If there are an infinite number of natural numbers, and an infinite number of fractions in between any two natural numbers, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and... then that must mean that there are not only infinite infinities, but an infinite number of those infinities. and an infinite number of those...

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 54 ·
2
Replies
54
Views
6K
Replies
5
Views
2K
  • · Replies 31 ·
2
Replies
31
Views
4K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 38 ·
2
Replies
38
Views
9K