Using Correlation to Predict Values

  • Context: Graduate 
  • Thread starter Thread starter Soveraign
  • Start date Start date
  • Tags Tags
    Correlation
Click For Summary

Discussion Overview

The discussion revolves around the use of correlation to predict values from multiple variables, particularly in the context of calculating expected values given correlated random variables. Participants explore the implications of joint distributions and the potential application of techniques like principal component analysis and multiple regression.

Discussion Character

  • Exploratory, Technical explanation, Debate/contested

Main Points Raised

  • One participant seeks to understand how to predict values from multiple correlated variables, specifically asking for the form of E(Y | A, B, C).
  • Another participant questions whether the term "predict" is being used correctly, suggesting a distinction between predicting values and computing expected values based on distribution functions.
  • A participant clarifies their intent to calculate the expected value of Y given A, B, C, and known correlations, as well as necessary variances.
  • Concerns are raised about the application of the formula E(Y|X) to random variables, with a focus on whether the variables have a joint multinormal distribution.
  • One participant proposes using principal component analysis to transform correlated variables into independent ones, suggesting this could facilitate multiple regression analysis.
  • Another participant notes that while marginal distributions may be normal, this does not guarantee that the joint distribution is multivariate normal, indicating a need to verify this assumption.

Areas of Agreement / Disagreement

Participants express differing views on the interpretation of "predict" versus "compute expected value," and there is no consensus on the assumptions regarding the joint distribution of the variables.

Contextual Notes

Participants highlight the importance of understanding the assumptions behind the joint distribution of the variables, particularly in relation to normality and independence.

Soveraign
Messages
55
Reaction score
0
I've searched the forums but am unable to find an answer to this:

Given two variables with a correlation, you can predict one from the other using the familiar
E(Y|X) = EY + r * s_y * (X - EX) / s_x

What I want to know is how to predict values from multiple variables, especially when these variables themselves are correlated.

E(Y | A B C) = ??
 
Physics news on Phys.org
You example shows computation for the expected value of a random variable, but you are using the word "predict" to phrase your question. Are you trying to "predict" the value of a random variable Y given the values of other random variables? Or is your goal to compute the expected value of Y give the distribution functions for other random variables?
 
Stephen Tashi said:
You example shows computation for the expected value of a random variable, but you are using the word "predict" to phrase your question. Are you trying to "predict" the value of a random variable Y given the values of other random variables? Or is your goal to compute the expected value of Y give the distribution functions for other random variables?

You are correct, I am looking to calculate the expected value of Y given A, B, C and known correlations YA, YB, YC, AB, AC, BC (and necessary variances, etc...)
 
Soveraign said:
E(Y|X) = EY + r * s_y * (X - EX) / s_x

I've only seen that formula applied to random variables that have a joint bivariate normal distribution. Are you assuming all the random variables in your question have a joint multinormal distribution?
 
Stephen Tashi said:
I've only seen that formula applied to random variables that have a joint bivariate normal distribution. Are you assuming all the random variables in your question have a joint multinormal distribution?

If I understand the definition correctly, then I think so. Y, A, B, C are normally distributed about a mean, but not necessarily independent (i.e. covariance != 0).

A thought I had was to perform principle component analysis on A, B, C so I then would have some new (independent) eigenvectors to work with. Perhaps then I could do multiple regression with my new A', B', C' working out an n-1 dimensional "plane" through my n space, thus working out E(Y|A', B', C')?

But I assume this is a solved problem and I'm just not looking in the right places.
 
Soveraign said:
But I assume this is a solved problem and I'm just not looking in the right places.

I looked too. I think this page (in the section called "The Multivariate Normal Distribution") gives the answer, but I haven't deciphered all the matrix notation.

As I recall, the fact that the marginal distributions are normal does not guarantee that the joint distribution is a multivariate normal. So you need to examine this assumption.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 43 ·
2
Replies
43
Views
6K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
1
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 4 ·
Replies
4
Views
6K