Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Correlation limits for binary variates

  1. Feb 22, 2016 #1


    User Avatar
    Gold Member

    I've been looking at detector coincidences and tried to find what general limits apply to coincidences. I was surprised how simply the calculation works out. My question is whether it is correct and where can I find similar stuff ?

    Consider two binary sequences produced by random processes where the probabilities of getting 1 are ##p_1## and ##p_2## respectively. Now assume that the number of 1's in the streams is ##1_n \rightarrow Np_n## as ##N \rightarrow \infty##.

    If we know the counts ##1_n,\ 0_n=N-1_n## in our sequences then by a permutation argument it is clear that the maximum number of (0,0) coincidences can not be greater than the minimum of ##0_1=N(1-p_1)## and ##0_2=N(1-p_2)##. Similarly the maximum possible (1,1) coincidences is the least of ##Np_1## and ##Np_2##. Assumimg ##p_1<p_2## this gives a total for the (0,0) and (1,1) coincidences of ##S_{12}=N(1-p_2+p_1)##. There are no permutations which give a greater total than this.

    The maximum possible correlation between the streams is given by ##\mathcal{C}_{12}=(2S_{12}-N)/N## which gives ##1-2(p_2-p_1)##.

    From this one can write for the maximum possible correlations between 4 streams ( assuming ##p_1\leq p_2 \leq p_3 \leq p_4##).

    ##|\mathcal{C}_{12}+\mathcal{C}_{23}+\mathcal{C}_{34}-\mathcal{C}_{41}| \leq 2##

    the ##p_n## terms conveniently cancelling.
    Last edited: Feb 22, 2016
  2. jcsd
  3. Feb 22, 2016 #2


    User Avatar
    Science Advisor

    Hey Mentz114.

    I think it would be useful to define a test statistic that can be decided on regarding whether co-incidences exist and then to use that to evaluate the hypothesis of co-incidences.

    If you can do that then you will have a far better chance of understanding and estimating this attribute in your random sample.

    Strictly speaking the first thing to do would probably involve assessing the sample for independence and independence means that any conditional probability of any sort equals the probability of the original random variable (not that being conditioned on).

    There are statistical tests to do this - and I think one involves chi-square.


    Basically if correlation exists it can exist in many forms but the independence test is the first thing to ascertain evidence of whether hidden correlations may exist.

    The other way is to partition the random variables and decompose them based on their correlation - something that happens in a Principal Component Analysis (or PCA). If information is independent then the decomposition should yield what was initially there to start off with and you won't be able to reduce the dimension of the system without significantly impacting its ability to capture variation.
  4. Feb 23, 2016 #3


    User Avatar
    Gold Member


    thanks for the reply. I think you might be misunderstanding what I'm doing. The theoretical limits on correlations is not ( it seems ) a very interesting subject but
    it crops up, see here for instance Bell notes and wiki CHSH.
    I could be in the wrong sub-forum ...
  5. Feb 23, 2016 #4


    User Avatar
    Science Advisor
    Gold Member
    2017 Award

    Maybe I am misunderstanding you, but I would say this is wrong. You are ignoring the possibility of an unlikely event. Although it is unlikely, they can both be 0 for all N trials as long as there is any possibility (i.e. neither p1 or p2 being 1).
    If the random variables are independent, they have an actual correlation of 0. But even if they are independent, it is possible for a sample to have a correlation anywhere between -1 and 1, inclusive. As the sample size, N, gets large, the probability of sample correlations being far from 0 gets small. But it is always possible to get values anywhere between -1 and 1, inclusive.
  6. Feb 24, 2016 #5


    User Avatar
    Gold Member

    Yes, this true. I don't think ##1-2(p_2-p_1)## is a limit (except asymptotically) because we have probabilties in the expression.

    In fact the first expression I worked out was the multi-stream limit where the probabilities cancel. This is the same as the CHSH inequality which is reckoned to be a true limit.
    Can my logic for ##|\mathcal{C}_{12}+\mathcal{C}_{23}+\mathcal{C}_{34}-\mathcal{C}_{41}| \leq 2## be saved because it has no probabilities in it ?

    I think I'm assuming the same things as the derivation I've attached, which uses set logic.

    Attached Files:

    Last edited: Feb 24, 2016
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Similar Threads - Correlation limits binary Date
A Interpreting Chi Squared ... backward Feb 11, 2018
I Interpreting the correlation Feb 6, 2018
I Correlation coeff in conditional distribution Oct 6, 2017
I R Value in Social Sciences Sep 26, 2017
I Correlation Limits without Probability Nov 5, 2016