Correlation Limits without Probability

In summary, the conversation discusses the maximum correlation possible between any permutations of two binary sequences with a given length and number of 1's. The maximum correlation is determined by the number of coincidences and anti-coincidences between the two sequences, with a maximum possible value of 1 if the sequences are identical. The conversation also introduces a more general expression for the maximum correlation that is symmetric and does not require a specific relationship between the number of 1's in each sequence. The conversation concludes with a discussion on the precise definition of correlation in this context.
  • #1
Mentz114
5,432
292
I'm reposting this piece of logic because my earlier attempts to get it across failed - nobody got what I was saying.

This is about binary sequences like ##010110100111001001...##.
All we need to know about a particular sequence is its length, ##N## and the number of 1's it contains. The logic applies to all permutations of any given sequence.
[tex]
\begin{align*}
\overbrace{
\underbrace{1111111111...}_\text{count of 1s $=1_K$} \
\underbrace{0000000000...}_\text{count of 0s $=0_K$}
}^\text{length of K is $N=0_K+1_K$}
\end{align*}
[/tex]
The question is - given two sequences labelled A and B with 1 counts of ##1_A## and ##1_B##, what is the maximum correlation possible between any permutations of A and B ? The answer is obviously ##\pm1##. But this extreme can only be reached by a small subset of all sequences, the remainder having limits ##|\rho_M|<1##.

The comparison of two sequences means arranging then in two rows and counting the coincidences 0,0 and 1,1 denoted by ##S_{AB}##. The number of anti-coincidences ##A_{AB}## is obviously ##N-S_{AB}##. Finally, the correlation between A and B is defined in terms of coincidence counts as ##\mathcal{C}_{AB}= (S_{AB}-A_{AB})/(S_{AB}+A_{AB})=(2S_{AB}-N)/N ## which is in ##[-1,1]##.

So far everything is definitions and notation. Now comes the crucial ansatz

If ##1_A<1_B## then the maximum number of 1,1 coincidences is ##1_A## and the maximum number of 0,0 coincidences is ##0_B=N-1_B## and the maximum number of symmetric coincidences possible is ##S_{AB}= 1_A+0_B = 1_A-1_B +N## .

Putting this into the correlation gives ##1-\frac{2(1_B-1_A)}{N} \le 1 ##. This can only be 1 if ##1_A=1_B##. In all other cases the extreme cannot be achieved because there will be anti-coincidences and coincidences.

Is there a fault in the logic ? I suspect this could be more elegantly reasoned

Has anyone got a reference to something similar ?
 
  • Like
Likes Igael
Physics news on Phys.org
  • #2
You appear to be implicitly assuming that both sequences have the same length. I will maintain that assumption.
If ##1_A=1_B## then by sorting both sequences in descending order we see that the sorted sequences are identical, as the first ##1_A## elements of both are 1 and the next ##N-1_A## elements of both are 0. So all pairs match.

Assume WLOG that ##1_A\leq 1_B## and let ##1_B-1_A=h##. Then if both sequences are sorted in descending order we have:
  • the first ##1_A## elements matching as (1,1), where the first number indicates the ##A## value and the second the ##B## value
  • the next ##h## elements mismatching as (0,1)
  • the last ##N-1_B=N-1_A-h## elements matching as (0,0)
So ##S_{AB}=N-h,A_{AB}=h## and the correlation statistic is ##1-2h/N##. Since ##0\leq h\leq N##, this statistic is in ##[-1,1]## and it is 1 iff ##h=0## (in which case the two sorted strings are identical) and -1 iff ##h=N## (in which case string ##A## is all 0s and string ##B## is all 1s).
 
  • Like
Likes Igael
  • #3
andrewkirk said:
You appear to be implicitly assuming that both sequences have the same length. I will maintain that assumption.
If ##1_A=1_B## then by sorting both sequences in descending order we see that the sorted sequences are identical, as the first ##1_A## elements of both are 1 and the next ##N-1_A## elements of both are 0. So all pairs match.

Assume WLOG that ##1_A\leq 1_B## and let ##1_B-1_A=h##. Then if both sequences are sorted in descending order we have:
  • the first ##1_A## elements matching as (1,1), where the first number indicates the ##A## value and the second the ##B## value
  • the next ##h## elements mismatching as (0,1)
  • the last ##N-1_B=N-1_A-h## elements matching as (0,0)
So ##S_{AB}=N-h,A_{AB}=h## and the correlation statistic is ##1-2h/N##. Since ##0\leq h\leq N##, this statistic is in ##[-1,1]## and it is 1 iff ##h=0## (in which case the two sorted strings are identical) and -1 iff ##h=N## (in which case string ##A## is all 0s and string ##B## is all 1s).
Yes the sequences being compared have the same length. I agree with your derivation. It looks a bit shorter than mine.
A more general expression for the maximum correlation is
##-1\le 1-\frac{2|1_B-1_A|}{N}\le 1## which is symmetric under interchange of A and B and does not require the assumption ##1_A<1_B##.
 
  • Like
Likes Igael
  • #4
What is the precise definition of the correlation between 2 given sequences ?
 
  • #5
The OP defines what they want the term to mean in this context at the end of their 4th paragraph. It is different from the usual meaning of correlation.
 
  • #6
Sorry, I don't know how I missed it, thanks.
 

1. What is correlation and why is it important?

Correlation is a statistical measure that indicates the strength and direction of the relationship between two variables. It is important because it helps us understand the relationship between different factors and how they may impact each other.

2. Can correlation exist without probability?

Yes, correlation can exist without probability. Probability is only used to calculate the significance of the correlation, but the correlation itself can still exist without it.

3. How is correlation without probability calculated?

Correlation without probability is calculated using the Pearson correlation coefficient, which measures the linear relationship between two variables. It ranges from -1 to +1, with a value of 0 indicating no correlation and values closer to -1 or +1 indicating a stronger correlation.

4. What are the limitations of using correlation without probability?

One limitation is that it only measures the strength and direction of a linear relationship, but not the causality between variables. Another limitation is that it does not account for other factors that may influence the relationship between the variables.

5. How can correlation without probability be useful in scientific research?

Correlation without probability can be useful in identifying potential relationships between variables and generating hypotheses for further investigation. It can also be used to explore patterns and trends in data. However, it should always be interpreted with caution and other statistical methods should be used to confirm any findings.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
14
Views
1K
Replies
1
Views
640
  • Quantum Interpretations and Foundations
2
Replies
54
Views
3K
  • Math Proof Training and Practice
Replies
8
Views
1K
Replies
2
Views
805
  • Math Proof Training and Practice
3
Replies
80
Views
4K
  • Math Proof Training and Practice
4
Replies
114
Views
6K
  • Math Proof Training and Practice
2
Replies
64
Views
12K
  • Math Proof Training and Practice
2
Replies
43
Views
9K
Back
Top