Checking Triangle Inequality for List Similarity Metric

AI Thread Summary
The discussion centers on creating a distance metric for comparing two lists based on their similarity score, defined as the intersection over the union of the lists. The distance metric is calculated as one minus the similarity score, indicating that a distance of zero means the lists are identical. The validity of this distance formula in satisfying the triangle inequality is questioned, with initial proof provided for equal list sizes. However, concerns are raised about the metric's accuracy when list sizes vary significantly or when overlaps are small relative to list sizes. Adjustments to the metric may be necessary to account for these discrepancies for more accurate distance measurements.
hoffmann
Messages
65
Reaction score
0
Say I have two lists, List1 and List2 containing elements such as words. Some words are common two both List1 and List2. I want to create a distance metric that tells me how far apart the two lists are based on a similarity "score". The similarity score and distance metric are as follows:

Similarity score: Intersection(List1,List2) / Union(List1,List2)
Distance = 1 - Similarity Score

In other words, the similarity score is just the percentage overlap between the two lists and the distance is 0 when the two lists are the same. Say I generalize this to n lists and I calculate the distances between lists (a symmetric matrix of distances). My question is, is this distance formula valid? In other words, does it satisfy the triangle inequality? How do I check this?
 
Mathematics news on Phys.org
It's not very hard to prove that it satisfies the triangle inequality if all list sizes are equal. Let N = size of the list, x, y & z = pairwise overlaps between three lists. You must have that x >= y+z-N. Distance between lists that give you x is 1-x/(2N-x). With some algebra you can conclude that the triangle inequality is satisfied for all valid x, y & z.

It seems to be true for arbitrary list sizes too, but I don't see an easy way to prove that right now ...
 
yes, that is also not coming to me. if it helps, i know that this scoring metric is skewed when the sizes of the lists are large and their overlaps are small (only in comparison to the sizes of the lists, but still pretty large in comparison to smaller list/overlap sizes). it seems that i need to correct the metric for the size of the overlap to get a more accurate value. also the same sort of situation when the sizes of the lists are small and their overlaps are ~0. finally, when one of the lists is big and the other small (or vice versa), i could also run into a skewed distance value.
 
http://pnylab.com/pny/papers/nmet/nmet/index.html
 
beautiful! thank you!
 
Insights auto threads is broken atm, so I'm manually creating these for new Insight articles. In Dirac’s Principles of Quantum Mechanics published in 1930 he introduced a “convenient notation” he referred to as a “delta function” which he treated as a continuum analog to the discrete Kronecker delta. The Kronecker delta is simply the indexed components of the identity operator in matrix algebra Source: https://www.physicsforums.com/insights/what-exactly-is-diracs-delta-function/ by...
Suppose ,instead of the usual x,y coordinate system with an I basis vector along the x -axis and a corresponding j basis vector along the y-axis we instead have a different pair of basis vectors ,call them e and f along their respective axes. I have seen that this is an important subject in maths My question is what physical applications does such a model apply to? I am asking here because I have devoted quite a lot of time in the past to understanding convectors and the dual...
Thread 'Imaginary Pythagoras'
I posted this in the Lame Math thread, but it's got me thinking. Is there any validity to this? Or is it really just a mathematical trick? Naively, I see that i2 + plus 12 does equal zero2. But does this have a meaning? I know one can treat the imaginary number line as just another axis like the reals, but does that mean this does represent a triangle in the complex plane with a hypotenuse of length zero? Ibix offered a rendering of the diagram using what I assume is matrix* notation...
Back
Top