Can random weights create a correlation between two sequences?

  • Context: Graduate 
  • Thread starter Thread starter junglebeast
  • Start date Start date
  • Tags Tags
    Correlation
Click For Summary
SUMMARY

This discussion explores the phenomenon of introducing correlation between two uncorrelated random sequences, X[i] and Y[i], through the application of random weights in a weighted least squares estimation. The sequences are generated using the functions X[i] = rand(0,1) and Y[i] = 100 + rand(0,1), resulting in no initial correlation. However, when random weights are applied to the least squares estimation, a strong correlation appears in the plotted results, demonstrating that the introduction of weights can artificially create a correlation between the two sequences.

PREREQUISITES
  • Understanding of random number generation in programming (e.g., rand function)
  • Familiarity with least squares estimation and matrix equations
  • Knowledge of weighted least squares and its mathematical formulation
  • Basic skills in data visualization and plotting
NEXT STEPS
  • Study the mathematical foundations of weighted least squares estimation
  • Learn about the implications of introducing weights in statistical models
  • Explore data visualization techniques to better understand correlation
  • Investigate the properties of random sequences and their statistical behavior
USEFUL FOR

Data scientists, statisticians, and researchers interested in statistical modeling, correlation analysis, and the effects of weighting in regression analysis will benefit from this discussion.

junglebeast
Messages
514
Reaction score
2
I can generate two random sequences X and Y with no correlation. For example,

X = rand(0,1)
Y = 100 + rand(0,1)

Now, if I plot Y as a function of X, and scale it to a square, I get a random distribution of points showing no correlation (as expected):

http://img197.imageshack.us/img197/8318/corr1.gif

Even though X and Y are not linearly related, I can still find the least squares relationship by solving for [m|b] in this matrix equation (where ~ means as close to equal as possible, in the least squares sense),

[X |1] [m|b]^T ~ [Y ]

Basically, that's a matrix equation of the form

A B ~ Y

where A has 2 columns; the first column is the elements of X, the second column is all 1's. B is just the slope and intercept of the line, and Y is the column vector of the components of Y.

Now, having solved for B, I can just multiply A B = Z, which is essentially the best possible "reconstruction" of Y as a linear combination of X and 1. We don't expect this to be a very good reconstruction because they are not linearly related...and if we plot Y as a function of Z, we still get basically a random noise image (like above).

Now for the kicker... say we want to do a weighted least squares estimate instead. We can form a matrix W that is diagonal, and has the weights for each linear equation on each row. This gives the matrix equation,

W A B ~ W Y

Note that if you wanted to solve for B directly, you can rearrange this through a few simple steps...

W A B ~ W Y
A^T W A B ~ A^T W Y
B ~ ( A^T W A )^{-1} A^T W Y

Note: I only mention that because it puts it into the same form as the definition for weighted least squares on Wikipedia, http://en.wikipedia.org/wiki/Least_squares (for anyone who's confused by my notation).

Moving on...let's just choose a random set of weights for the example. ie,

W(i,i) = rand(0,100)

now, if you plot (W A B) as a function of (W Y), we get this picture...

http://img197.imageshack.us/img197/2379/corr2.gif

Whoa! all of a sudden it looks like there is a strong correlation, but that can't be...we already know there is no correlation by the design of the problem, and all we did was choose random weights.

What's going on?
 
Last edited by a moderator:
Physics news on Phys.org
By multiplying each row of [X Y] with a weight, you have introduced a correlation between X and Y. Nice example.
 
EnumaElish said:
By multiplying each row of [X Y] with a weight, you have introduced a correlation between X and Y. Nice example.

Good point...I did not think of it from that perspective. Silly me
 

Similar threads

  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K