Correlation Limits without Probability

Click For Summary

Discussion Overview

The discussion revolves around the concept of correlation between binary sequences, specifically focusing on the maximum correlation possible between two sequences based on their lengths and counts of 1's. Participants explore the definitions and implications of this correlation in a mathematical context, examining various assumptions and derivations related to the sequences.

Discussion Character

  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • The original poster (OP) presents a method to calculate the maximum correlation between two binary sequences based on their lengths and counts of 1's, suggesting that the maximum correlation is limited to values less than 1 unless the counts of 1's are equal.
  • One participant agrees with the OP's assumption of equal lengths for the sequences and provides a derivation showing that if the counts of 1's are equal, the sequences can be sorted to match perfectly, resulting in a correlation of 1.
  • This participant also introduces a variable to represent the difference in counts of 1's, leading to a correlation formula that remains valid under the assumption of equal lengths.
  • Another participant reiterates the agreement on the sequences having the same length and acknowledges the derivation provided, suggesting a more general expression for maximum correlation that does not depend on the order of the sequences.
  • One participant inquires about the precise definition of correlation being used in this context, indicating a potential difference from standard definitions.
  • Another participant points out that the OP's definition of correlation differs from the usual meaning, acknowledging their oversight in missing this detail.

Areas of Agreement / Disagreement

Participants generally agree on the assumption that the sequences being compared have the same length. However, there is no consensus on the implications of the correlation definitions, as some participants note that the OP's definition diverges from standard interpretations.

Contextual Notes

Participants have not resolved the differences in the definition of correlation, and there are varying interpretations of the implications of the derived formulas based on the assumptions made.

Mentz114
Messages
5,429
Reaction score
292
I'm reposting this piece of logic because my earlier attempts to get it across failed - nobody got what I was saying.

This is about binary sequences like ##010110100111001001...##.
All we need to know about a particular sequence is its length, ##N## and the number of 1's it contains. The logic applies to all permutations of any given sequence.
<br /> \begin{align*}<br /> \overbrace{<br /> \underbrace{1111111111...}_\text{count of 1s $=1_K$} \<br /> \underbrace{0000000000...}_\text{count of 0s $=0_K$}<br /> }^\text{length of K is $N=0_K+1_K$}<br /> \end{align*}<br />
The question is - given two sequences labelled A and B with 1 counts of ##1_A## and ##1_B##, what is the maximum correlation possible between any permutations of A and B ? The answer is obviously ##\pm1##. But this extreme can only be reached by a small subset of all sequences, the remainder having limits ##|\rho_M|<1##.

The comparison of two sequences means arranging then in two rows and counting the coincidences 0,0 and 1,1 denoted by ##S_{AB}##. The number of anti-coincidences ##A_{AB}## is obviously ##N-S_{AB}##. Finally, the correlation between A and B is defined in terms of coincidence counts as ##\mathcal{C}_{AB}= (S_{AB}-A_{AB})/(S_{AB}+A_{AB})=(2S_{AB}-N)/N ## which is in ##[-1,1]##.

So far everything is definitions and notation. Now comes the crucial ansatz

If ##1_A<1_B## then the maximum number of 1,1 coincidences is ##1_A## and the maximum number of 0,0 coincidences is ##0_B=N-1_B## and the maximum number of symmetric coincidences possible is ##S_{AB}= 1_A+0_B = 1_A-1_B +N## .

Putting this into the correlation gives ##1-\frac{2(1_B-1_A)}{N} \le 1 ##. This can only be 1 if ##1_A=1_B##. In all other cases the extreme cannot be achieved because there will be anti-coincidences and coincidences.

Is there a fault in the logic ? I suspect this could be more elegantly reasoned

Has anyone got a reference to something similar ?
 
  • Like
Likes   Reactions: Igael
Physics news on Phys.org
You appear to be implicitly assuming that both sequences have the same length. I will maintain that assumption.
If ##1_A=1_B## then by sorting both sequences in descending order we see that the sorted sequences are identical, as the first ##1_A## elements of both are 1 and the next ##N-1_A## elements of both are 0. So all pairs match.

Assume WLOG that ##1_A\leq 1_B## and let ##1_B-1_A=h##. Then if both sequences are sorted in descending order we have:
  • the first ##1_A## elements matching as (1,1), where the first number indicates the ##A## value and the second the ##B## value
  • the next ##h## elements mismatching as (0,1)
  • the last ##N-1_B=N-1_A-h## elements matching as (0,0)
So ##S_{AB}=N-h,A_{AB}=h## and the correlation statistic is ##1-2h/N##. Since ##0\leq h\leq N##, this statistic is in ##[-1,1]## and it is 1 iff ##h=0## (in which case the two sorted strings are identical) and -1 iff ##h=N## (in which case string ##A## is all 0s and string ##B## is all 1s).
 
  • Like
Likes   Reactions: Igael
andrewkirk said:
You appear to be implicitly assuming that both sequences have the same length. I will maintain that assumption.
If ##1_A=1_B## then by sorting both sequences in descending order we see that the sorted sequences are identical, as the first ##1_A## elements of both are 1 and the next ##N-1_A## elements of both are 0. So all pairs match.

Assume WLOG that ##1_A\leq 1_B## and let ##1_B-1_A=h##. Then if both sequences are sorted in descending order we have:
  • the first ##1_A## elements matching as (1,1), where the first number indicates the ##A## value and the second the ##B## value
  • the next ##h## elements mismatching as (0,1)
  • the last ##N-1_B=N-1_A-h## elements matching as (0,0)
So ##S_{AB}=N-h,A_{AB}=h## and the correlation statistic is ##1-2h/N##. Since ##0\leq h\leq N##, this statistic is in ##[-1,1]## and it is 1 iff ##h=0## (in which case the two sorted strings are identical) and -1 iff ##h=N## (in which case string ##A## is all 0s and string ##B## is all 1s).
Yes the sequences being compared have the same length. I agree with your derivation. It looks a bit shorter than mine.
A more general expression for the maximum correlation is
##-1\le 1-\frac{2|1_B-1_A|}{N}\le 1## which is symmetric under interchange of A and B and does not require the assumption ##1_A<1_B##.
 
  • Like
Likes   Reactions: Igael
What is the precise definition of the correlation between 2 given sequences ?
 
The OP defines what they want the term to mean in this context at the end of their 4th paragraph. It is different from the usual meaning of correlation.
 
Sorry, I don't know how I missed it, thanks.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 29 ·
Replies
29
Views
6K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 27 ·
Replies
27
Views
4K
  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 54 ·
2
Replies
54
Views
6K
Replies
8
Views
3K
  • · Replies 0 ·
Replies
0
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 80 ·
3
Replies
80
Views
10K