Using Correlation to Predict Values

  • Context: Graduate 
  • Thread starter Thread starter Soveraign
  • Start date Start date
  • Tags Tags
    Correlation
Click For Summary
SUMMARY

This discussion focuses on predicting values from multiple correlated variables using the expected value formula E(Y|A, B, C). The participants emphasize the necessity of understanding the joint distribution of the variables involved, specifically whether they follow a joint multinormal distribution. Techniques such as Principal Component Analysis (PCA) are suggested to transform correlated variables into independent eigenvectors, enabling multiple regression analysis. The conversation highlights the importance of verifying assumptions about the distribution of the variables to ensure accurate predictions.

PREREQUISITES
  • Understanding of multivariate normal distribution
  • Familiarity with the expected value formula E(Y|X)
  • Knowledge of Principal Component Analysis (PCA)
  • Experience with multiple regression techniques
NEXT STEPS
  • Study the properties of the multivariate normal distribution
  • Learn how to apply Principal Component Analysis (PCA) for dimensionality reduction
  • Explore multiple regression analysis with correlated predictors
  • Investigate the implications of covariance in predicting outcomes
USEFUL FOR

Data scientists, statisticians, and analysts interested in predictive modeling and understanding the relationships between multiple correlated variables.

Soveraign
Messages
55
Reaction score
0
I've searched the forums but am unable to find an answer to this:

Given two variables with a correlation, you can predict one from the other using the familiar
E(Y|X) = EY + r * s_y * (X - EX) / s_x

What I want to know is how to predict values from multiple variables, especially when these variables themselves are correlated.

E(Y | A B C) = ??
 
Physics news on Phys.org
You example shows computation for the expected value of a random variable, but you are using the word "predict" to phrase your question. Are you trying to "predict" the value of a random variable Y given the values of other random variables? Or is your goal to compute the expected value of Y give the distribution functions for other random variables?
 
Stephen Tashi said:
You example shows computation for the expected value of a random variable, but you are using the word "predict" to phrase your question. Are you trying to "predict" the value of a random variable Y given the values of other random variables? Or is your goal to compute the expected value of Y give the distribution functions for other random variables?

You are correct, I am looking to calculate the expected value of Y given A, B, C and known correlations YA, YB, YC, AB, AC, BC (and necessary variances, etc...)
 
Soveraign said:
E(Y|X) = EY + r * s_y * (X - EX) / s_x

I've only seen that formula applied to random variables that have a joint bivariate normal distribution. Are you assuming all the random variables in your question have a joint multinormal distribution?
 
Stephen Tashi said:
I've only seen that formula applied to random variables that have a joint bivariate normal distribution. Are you assuming all the random variables in your question have a joint multinormal distribution?

If I understand the definition correctly, then I think so. Y, A, B, C are normally distributed about a mean, but not necessarily independent (i.e. covariance != 0).

A thought I had was to perform principle component analysis on A, B, C so I then would have some new (independent) eigenvectors to work with. Perhaps then I could do multiple regression with my new A', B', C' working out an n-1 dimensional "plane" through my n space, thus working out E(Y|A', B', C')?

But I assume this is a solved problem and I'm just not looking in the right places.
 
Soveraign said:
But I assume this is a solved problem and I'm just not looking in the right places.

I looked too. I think this page (in the section called "The Multivariate Normal Distribution") gives the answer, but I haven't deciphered all the matrix notation.

As I recall, the fact that the marginal distributions are normal does not guarantee that the joint distribution is a multivariate normal. So you need to examine this assumption.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 43 ·
2
Replies
43
Views
5K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 14 ·
Replies
14
Views
2K
Replies
1
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 4 ·
Replies
4
Views
6K
Replies
3
Views
2K