Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Correlations with combined sums and products

  1. Oct 15, 2007 #1
    I was trying to build a probability-related software package and needed to have a theoretical framework to deal with some less common issues (i.e. stuff that you don't find in the average text book). I was hoping that somebody could give me pointers as to where to find the proper formulae.

    Basically I am trying to figure out how to do arbitrary (symbolic) calculations on random variables which involve both sum and product operations fully taking into account arbitrary correlations between the variables. If I have the correlation coefficients of the variables then obviously summing them is easy. Also if I have the correlation coefficients of their logs multiplying them is easy (since a product involves summing the logs and the log/exponent formulae for normals are well defined). And, of course, determining the resulting correlations between the sums or products and the original variables is straightforward.

    So where things are more murky (for me at least) is when I go beyond these simple cases. If, say, I want to multiply two sums, how do I handle the correlations? E.g. Let's say z = (a + b) * (c + d). If I know the correlation of a and b I can easily determine the correlation of their sum to a or b. Similarly c + d is easy. But now if I multiply those two sums, how do I determine the correlation of z to the original variables a, b, c, and d? And if all the variables originally had some non-zero correlation how do I take that into account in the product since the result of the original summations gave me the correlation coefficients wrt the original variables but not the correlations of the logarithms which is what I would need for the multiplication?

    Can anybody give me a clue how to start figuring this out?

    Thanks.
     
  2. jcsd
  3. Oct 16, 2007 #2

    EnumaElish

    User Avatar
    Science Advisor
    Homework Helper

    Can't you apply the log rule to x*y where x = a+b and y = c+d?
     
  4. Oct 16, 2007 #3
    Again, how? If I do

    x*y = e^(log(x) + log(y))​

    Then I have to know the correlation of log(x) and log(y). How do I figure that out?
     
  5. Oct 16, 2007 #4

    EnumaElish

    User Avatar
    Science Advisor
    Homework Helper

    I was going with your statement "if I have the correlation coefficients of their logs multiplying them is easy." Now I realize that you don't have Corr(Log x, Log y) and the problem looks really difficult, because Corr is a linear operator and Log is a nonlinear function. Which tells me you need to somehow linearize the log function. E.g. if x is near 1, then Log(x) is approximately equal to x - 1. Maybe you can devise some kind of a scaling that would result in [itex]\overline\xi[/itex] = 1 where [itex]\xi[/itex] is the scaled version of x.
     
    Last edited: Oct 16, 2007
  6. Oct 16, 2007 #5
    Well, there are all sorts of approximations I can devise but they would not be precise. Since log(x) and log(y) are both normal (i.e. log of normal is still normal) there should be a simple correlation coefficient that can be calculated and used. I presume, then, there should be a closed form solution for calculating it although I don't know what that would be.

    One observation: If the correlation of x and y was zero, then the correlation of log(x) and log(y) is zero. Also if x and y have the same mean/stdev and their correlation is unity, then the correlation of log(x) and log(y) is also unity. That would seem to point toward the correlations always being equal although intuitively that doesn't sound right.
     
    Last edited: Oct 16, 2007
  7. Oct 16, 2007 #6

    EnumaElish

    User Avatar
    Science Advisor
    Homework Helper

    Wrong. Log of Lognormal is normal.
    I am not sure that's necessarily the case. Similarly for the unit correlation case.
     
  8. Oct 16, 2007 #7
    Actually you're right. I'm not thinking clearly.
    This one you're not thinking clearly about. If x and y are uncorrelated then it is not possible that their logs could have any correlation. The log transformation does not introduce any component that they could have in common.
    If they are perfectly correlated and they have the same mean and standard deviation then they are, by definition, exactly the same number. Therefore their logs are exactly the same number. Therefore their logs are perfectly correlated.
    These are the only obvious cases that I can think of at the moment. Any other cases seem to require a more elaborate proof.
     
  9. Oct 16, 2007 #8

    EnumaElish

    User Avatar
    Science Advisor
    Homework Helper

    The point is, Log is a nonlinear transformation; but corr is a linear operation. In general properties of linear operators are not invariant under a nonlinear transformation.

    A trivial example is Corr(a, b) = 0 where some elements of a and/or b are zero (or negative). Then Corr(Log(a), Log(b)) is undefined.

    For similar examples, see: http://en.wikipedia.org/wiki/Correlation#Correlation_and_linearity
     
    Last edited: Oct 16, 2007
  10. Oct 16, 2007 #9
    Of course, that's what I said. But that doesn't prove that there isn't a direct, even linear, relationship between the variable correlations and the log correlations. Seems unlikely but as yet I haven't found a proof one way or the other.

    Well, by that argument log of normal is undefined period since any normal distribution has negative values.

    Anyway, point is, I need a theoretical framework to do the calculation. I know there are other software packages that do this sort of thing without doing Monte Carlo analysis but I don't know what the math behind their calculations is.

    Thanks, BTW, for the interest.
     
  11. Oct 16, 2007 #10

    EnumaElish

    User Avatar
    Science Advisor
    Homework Helper

    Precisely.
     
  12. Oct 16, 2007 #11
    Well, not "precisely". The reality is that it is not undefined. It is just not real. E.g.

    ln(-1) = i*3.1415927

    Similarly the correlation is not undefined although it could perhaps have imaginary components.

    In other words, the question is not moot it is just "complex" (pun intended).
     
  13. Oct 16, 2007 #12

    EnumaElish

    User Avatar
    Science Advisor
    Homework Helper

    You are right; that was a non sequitur. Still, my point about non-linearity applies. Just because vectors a and b are uncorrelated does not mean their nonlinear functions cannot be correlated.

    Here is a numerical example:
    x = {
    0.147281700000000,
    0.230993647671506,
    0.427692041391026,
    0.079822616900000,
    0.291048299000000,
    0.185000000000000,
    0.088631713672936,
    0.266815063276460,
    0.182298600000000,
    0.850679700000000
    };
    y = {
    0.872438309154607,
    0.186970421455947,
    0.738597327308731,
    0.598236593500000,
    0.462298740000000,
    0.330000000000000,
    0.115598225897281,
    0.107657376896975,
    0.207345000000000,
    0.325996428800000
    };
    Corr(x,y) [itex]\approx[/itex] 0 ( = 5.1841*10^-12)
    But Corr(Log(x), Log(y)) = 0.0873581.
     
    Last edited: Oct 16, 2007
  14. Oct 17, 2007 #13
    You are absolutely right. The non-linearity can create these oddball situations. When I work these problems out I normally think of the correlation as distinguishing between a single correlated component and a single uncorrelated component. For linear operations this approach is valid. But when you have non-linear operations then this simplification breaks down (usually accurate but not necessarily).

    Curiouser and curiouser ...

    Doesn't get me any closer to an answer, though. :-)
     
  15. Oct 17, 2007 #14

    EnumaElish

    User Avatar
    Science Advisor
    Homework Helper

    Somehow, you need to linearize.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?



Similar Discussions: Correlations with combined sums and products
  1. Sum of Combinations (Replies: 12)

Loading...