Measuring Sequence Similarity with Absolute Differences

  • Context: Undergrad 
  • Thread starter Thread starter onako
  • Start date Start date
Click For Summary
SUMMARY

The discussion focuses on measuring sequence similarity using absolute differences between two sequences, A and B. The proposed algorithm for quantifying similarity is the sum of absolute differences, represented mathematically as ∑ |a_i - b_i|. Participants also mention alternative methods involving powers of the absolute difference, particularly the Lp norm with p=2, which is commonly used for this purpose. The conversation emphasizes the importance of selecting the appropriate metric for accurately assessing sequence closeness.

PREREQUISITES
  • Understanding of absolute difference calculations
  • Familiarity with sequence or vector representation
  • Knowledge of Lp norms, particularly L1 and L2 norms
  • Basic concepts in number theory related to sequences
NEXT STEPS
  • Research the implementation of the L1 norm for sequence similarity
  • Explore the L2 norm and its applications in measuring similarity
  • Learn about other distance metrics such as Euclidean and Manhattan distances
  • Investigate algorithms for optimizing sequence comparison in large datasets
USEFUL FOR

Data scientists, mathematicians, and software developers working on algorithms for sequence analysis and similarity measurement will benefit from this discussion.

onako
Messages
86
Reaction score
0
Hi all,

I'm faced with the following problem (it involves two equally large number sequences, or vectors; therefore, it might be considered as a problem of number theory, I guess; please move it to appropriate place if you think differently):
Given two sequences, for example:
A=[1, 2, 4, 6, 7, 2, 1];
B=[1, 2, 4, 7, 8, 2, 1];
Give an algorithm that should express the entrywise similarity/closeness of the sequences by a certain number. For the above sequeces, A and B are similar(differ in two entries); the sequence C might have all 5's and is still more similar to A than sequence D with all 100's...
I thought of the following:
[tex] \sum |a_i-b_i|[/tex]
Other proposals are very welcome. Many thanks
 
Physics news on Phys.org
There are many. Typical ones involve powers of the abs. value of the difference, where the exponent 2 is most often used. (Look up lp, especially p=2. You are describing p=1.).
 

Similar threads

  • · Replies 2 ·
Replies
2
Views
3K
Replies
8
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 17 ·
Replies
17
Views
2K