# Using Correlation to Predict Values

1. Dec 21, 2011

### Soveraign

I've searched the forums but am unable to find an answer to this:

Given two variables with a correlation, you can predict one from the other using the familiar
E(Y|X) = EY + r * s_y * (X - EX) / s_x

What I want to know is how to predict values from multiple variables, especially when these variables themselves are correlated.

E(Y | A B C) = ??

2. Dec 21, 2011

### Stephen Tashi

You example shows computation for the expected value of a random variable, but you are using the word "predict" to phrase your question. Are you trying to "predict" the value of a random variable Y given the values of other random variables? Or is your goal to compute the expected value of Y give the distribution functions for other random variables?

3. Dec 21, 2011

### Soveraign

You are correct, I am looking to calculate the expected value of Y given A, B, C and known correlations YA, YB, YC, AB, AC, BC (and necessary variances, etc...)

4. Dec 22, 2011

### Stephen Tashi

I've only seen that formula applied to random variables that have a joint bivariate normal distribution. Are you assuming all the random variables in your question have a joint multinormal distribution?

5. Dec 22, 2011

### Soveraign

If I understand the definition correctly, then I think so. Y, A, B, C are normally distributed about a mean, but not necessarily independent (i.e. covariance != 0).

A thought I had was to perform principle component analysis on A, B, C so I then would have some new (independent) eigenvectors to work with. Perhaps then I could do multiple regression with my new A', B', C' working out an n-1 dimensional "plane" through my n space, thus working out E(Y|A', B', C')?

But I assume this is a solved problem and I'm just not looking in the right places.

6. Dec 23, 2011

### Stephen Tashi

I looked too. I think this page (in the section called "The Multivariate Normal Distribution") gives the answer, but I haven't deciphered all the matrix notation.

As I recall, the fact that the marginal distributions are normal does not guarantee that the joint distribution is a multivariate normal. So you need to examine this assumption.