Correlations with combined sums and products

  • Context: Graduate 
  • Thread starter Thread starter mcorazao
  • Start date Start date
  • Tags Tags
    Sums
Click For Summary

Discussion Overview

The discussion revolves around the theoretical framework needed for performing symbolic calculations on random variables that involve both sum and product operations, particularly in the context of their correlations. Participants explore the complexities that arise when dealing with arbitrary correlations beyond simple cases, such as multiplying sums of random variables.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant seeks guidance on handling correlations when multiplying sums of random variables, specifically how to determine the correlation of the product with the original variables.
  • Another participant suggests applying the log rule to the product of sums but questions how to find the correlation of the logarithms of those sums.
  • A participant notes the challenge of determining the correlation of log-transformed variables due to the non-linear nature of the logarithm compared to the linearity of correlation.
  • Some participants propose that if two variables are uncorrelated, their logarithms should also be uncorrelated, while others challenge this assumption.
  • There is a discussion about the implications of correlation coefficients when dealing with log-normal distributions and the potential for undefined correlations in certain cases.
  • A numerical example is provided to illustrate how non-linearity can lead to unexpected correlations between log-transformed variables, even when the original variables are uncorrelated.
  • Participants express uncertainty about the existence of a direct relationship between variable correlations and log correlations, noting the need for a theoretical framework for calculations.

Areas of Agreement / Disagreement

Participants express differing views on the relationship between the correlations of original variables and their logarithmic transformations. There is no consensus on how to handle the complexities introduced by non-linear transformations or the implications for correlation coefficients.

Contextual Notes

Participants acknowledge limitations in their understanding of how to accurately calculate correlations involving logarithmic transformations and the potential for undefined or complex results in certain scenarios.

mcorazao
Messages
8
Reaction score
0
I was trying to build a probability-related software package and needed to have a theoretical framework to deal with some less common issues (i.e. stuff that you don't find in the average textbook). I was hoping that somebody could give me pointers as to where to find the proper formulae.

Basically I am trying to figure out how to do arbitrary (symbolic) calculations on random variables which involve both sum and product operations fully taking into account arbitrary correlations between the variables. If I have the correlation coefficients of the variables then obviously summing them is easy. Also if I have the correlation coefficients of their logs multiplying them is easy (since a product involves summing the logs and the log/exponent formulae for normals are well defined). And, of course, determining the resulting correlations between the sums or products and the original variables is straightforward.

So where things are more murky (for me at least) is when I go beyond these simple cases. If, say, I want to multiply two sums, how do I handle the correlations? E.g. Let's say z = (a + b) * (c + d). If I know the correlation of a and b I can easily determine the correlation of their sum to a or b. Similarly c + d is easy. But now if I multiply those two sums, how do I determine the correlation of z to the original variables a, b, c, and d? And if all the variables originally had some non-zero correlation how do I take that into account in the product since the result of the original summations gave me the correlation coefficients wrt the original variables but not the correlations of the logarithms which is what I would need for the multiplication?

Can anybody give me a clue how to start figuring this out?

Thanks.
 
Physics news on Phys.org
Can't you apply the log rule to x*y where x = a+b and y = c+d?
 
Again, how? If I do

x*y = e^(log(x) + log(y))​

Then I have to know the correlation of log(x) and log(y). How do I figure that out?
 
I was going with your statement "if I have the correlation coefficients of their logs multiplying them is easy." Now I realize that you don't have Corr(Log x, Log y) and the problem looks really difficult, because Corr is a linear operator and Log is a nonlinear function. Which tells me you need to somehow linearize the log function. E.g. if x is near 1, then Log(x) is approximately equal to x - 1. Maybe you can devise some kind of a scaling that would result in \overline\xi = 1 where \xi is the scaled version of x.
 
Last edited:
Well, there are all sorts of approximations I can devise but they would not be precise. Since log(x) and log(y) are both normal (i.e. log of normal is still normal) there should be a simple correlation coefficient that can be calculated and used. I presume, then, there should be a closed form solution for calculating it although I don't know what that would be.

One observation: If the correlation of x and y was zero, then the correlation of log(x) and log(y) is zero. Also if x and y have the same mean/stdev and their correlation is unity, then the correlation of log(x) and log(y) is also unity. That would seem to point toward the correlations always being equal although intuitively that doesn't sound right.
 
Last edited:
mcorazao said:
log of normal is still normal
Wrong. Log of Lognormal is normal.
If the correlation of x and y was zero, then the correlation of log(x) and log(y) is zero
I am not sure that's necessarily the case. Similarly for the unit correlation case.
 
EnumaElish said:
Wrong. Log of Lognormal is normal.
Actually you're right. I'm not thinking clearly.
EnumaElish said:
I am not sure that's necessarily the case. Similarly for the unit correlation case.
This one you're not thinking clearly about. If x and y are uncorrelated then it is not possible that their logs could have any correlation. The log transformation does not introduce any component that they could have in common.
If they are perfectly correlated and they have the same mean and standard deviation then they are, by definition, exactly the same number. Therefore their logs are exactly the same number. Therefore their logs are perfectly correlated.
These are the only obvious cases that I can think of at the moment. Any other cases seem to require a more elaborate proof.
 
mcorazao said:
If x and y are uncorrelated then it is not possible that their logs could have any correlation. The log transformation does not introduce any component that they could have in common.
The point is, Log is a nonlinear transformation; but corr is a linear operation. In general properties of linear operators are not invariant under a nonlinear transformation.

A trivial example is Corr(a, b) = 0 where some elements of a and/or b are zero (or negative). Then Corr(Log(a), Log(b)) is undefined.

For similar examples, see: http://en.wikipedia.org/wiki/Correlation#Correlation_and_linearity
 
Last edited:
EnumaElish said:
The point is, Log is a nonlinear transformation; but corr is a linear operation. In general properties of linear operators are not invariant under a nonlinear transformation.
Of course, that's what I said. But that doesn't prove that there isn't a direct, even linear, relationship between the variable correlations and the log correlations. Seems unlikely but as yet I haven't found a proof one way or the other.

EnumaElish said:
A trivial example is Corr(a, b) = 0 where some elements of a and/or b are zero (or negative). Then Corr(Log(a), Log(b)) is undefined.

Well, by that argument log of normal is undefined period since any normal distribution has negative values.

Anyway, point is, I need a theoretical framework to do the calculation. I know there are other software packages that do this sort of thing without doing Monte Carlo analysis but I don't know what the math behind their calculations is.

Thanks, BTW, for the interest.
 
  • #10
mcorazao said:
by that argument log of normal is undefined period
Precisely.
 
  • #11
EnumaElish said:
Precisely.

Well, not "precisely". The reality is that it is not undefined. It is just not real. E.g.

ln(-1) = i*3.1415927

Similarly the correlation is not undefined although it could perhaps have imaginary components.

In other words, the question is not moot it is just "complex" (pun intended).
 
  • #12
You are right; that was a non sequitur. Still, my point about non-linearity applies. Just because vectors a and b are uncorrelated does not mean their nonlinear functions cannot be correlated.

Here is a numerical example:
x = {
0.147281700000000,
0.230993647671506,
0.427692041391026,
0.079822616900000,
0.291048299000000,
0.185000000000000,
0.088631713672936,
0.266815063276460,
0.182298600000000,
0.850679700000000
};
y = {
0.872438309154607,
0.186970421455947,
0.738597327308731,
0.598236593500000,
0.462298740000000,
0.330000000000000,
0.115598225897281,
0.107657376896975,
0.207345000000000,
0.325996428800000
};
Corr(x,y) \approx 0 ( = 5.1841*10^-12)
But Corr(Log(x), Log(y)) = 0.0873581.
 
Last edited:
  • #13
You are absolutely right. The non-linearity can create these oddball situations. When I work these problems out I normally think of the correlation as distinguishing between a single correlated component and a single uncorrelated component. For linear operations this approach is valid. But when you have non-linear operations then this simplification breaks down (usually accurate but not necessarily).

Curiouser and curiouser ...

Doesn't get me any closer to an answer, though. :-)
 
  • #14
Somehow, you need to linearize.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 43 ·
2
Replies
43
Views
6K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
Replies
3
Views
3K
  • · Replies 10 ·
Replies
10
Views
3K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
Replies
2
Views
3K