MHB Predict Z-Score for Y Given X at 30th Percentile

  • Thread starter Thread starter dlee
  • Start date Start date
AI Thread Summary
The discussion focuses on predicting the z-score for variable Y given that variable X is at the 30th percentile, with a specified correlation of ρ = 0.7. The initial calculation suggests a z-score of -0.524 for Y based on the assumption of a bivariate normal distribution. However, the correct z-score is -0.364, which is debated among participants. The conversation emphasizes the importance of correctly modeling the correlation and the relationship between the z-scores of X and Y, ultimately concluding that the standardized nature of the variables simplifies the calculation. The relationship between the z-scores is confirmed through simulation, reinforcing the derived correlation.
dlee
Messages
4
Reaction score
0
Consider two random variables X,Y whose correlation is ρ = 0.7 (and the joint PMF is football shaped). Predict the z-score for Y if you observe that X is at the 30th percentile (assuming X ~ N(4,4)).

The solution to this problem is -0.364, but I'm not sure how to approach this answer.
 
Physics news on Phys.org
Re: Correlation?

dlee said:
Consider two random variables X,Y whose correlation is ρ = 0.7 (and the joint PMF is football shaped). Predict the z-score for Y if you observe that X is at the 30th percentile (assuming X ~ N(4,4)).

The solution to this problem is -0.364, but I'm not sure how to approach this answer.

I we assume a bivariate normal distribution, we "expect" the relation:
$$y(x) = \text{sgn}(\rho) \frac {\sigma_Y}{\sigma_X} (x - \mu_X) + \mu_Y$$

With X at the 30th percentile, that means $z_X = \frac{x - \mu_X}{\sigma_X} = \text{invNorm}(0.30) = -0.524$.

In other words, the z-score for Y is
$$z_Y = \frac{y - \mu_Y}{\sigma_Y} = \text{sgn}(\rho) z_X = -0.524$$

I don't know how they got to -0.364.
 
Re: Correlation?

I like Serena said:
I we assume a bivariate normal distribution, we "expect" the relation:
$$y(x) = \text{sgn}(\rho) \frac {\sigma_Y}{\sigma_X} (x - \mu_X) + \mu_Y$$

With X at the 30th percentile, that means $z_X = \frac{x - \mu_X}{\sigma_X} = \text{invNorm}(0.30) = -0.524$.

In other words, the z-score for Y is
$$z_Y = \frac{y - \mu_Y}{\sigma_Y} = \text{sgn}(\rho) z_X = -0.524$$

I don't know how they got to -0.364.

That can't be right.

You can without loss of generality assume $$\mu_X = \mu_Y = 0$$, so we have a model:

$$y=\alpha x$$

then $\displaystyle \sigma_Y=\alpha\; \sigma_X$, and $\rho=E(XY)/(\sigma_X \sigma_Y)=\alpha\; \sigma_X/\sigma_Y$

Hence: $$\alpha=\rho \frac{\sigma_Y}{\sigma_X}$$...

.
 
Re: Correlation?

zzephod said:
That can't be right.

You can without loss of generality assume $$\mu_X = \mu_Y = 0$$

I didn't.
The problem asks for a z-score, meaning $\mu_X$, and $\mu_Y$ get eliminated (see my derivation).

so we have a model:

$$y=\alpha x$$

then $\displaystyle \sigma_Y=\alpha\; \sigma_X$, and $\rho=E(XY)/(\sigma_X \sigma_Y)=\alpha\; \sigma_X/\sigma_Y$

Hence: $$\alpha=\rho \frac{\sigma_Y}{\sigma_X}$$...

Well... multiplying by 0.7 almost gives the requested result.
But that won't be right.
 
Re: Correlation?

I like Serena said:
... Well... multiplying by 0.7 almost gives the requested result.
But that won't be right.

It will be if you use "nearest value" in inverse normal lookup in a table.

.
 
Last edited:
Re: Correlation?

I like Serena said:
I didn't.
The problem asks for a z-score, meaning $\mu_X$, and $\mu_Y$ get eliminated (see my derivation).

Well, since you failed to set up a model with the correct correlation it is not irrelevant to make an observation that simplifies setting the correlation without changing the answer.

.
 
Last edited:
Re: Correlation?

zzephod said:
Well, since you failed to set up a model with the correct correlation it is not irrelevant to make an observation that simplifies setting the correlation without changing the answer.

.

The model is a positive sloped football that could be anywhere.
The problem puts the heart at x=4 with a variance of 4.
The y coordinate of the heart and the slope can still be freely chosen.
Then, with the given correlation, the "width" of the football becomes fixed.

Either way, when talking about the z-score of y, all these choices become moot, since they are standardized.
The relationship between $E(z_Y|z_X)$ and $z_X$ is simply $E(z_Y|z_X) = z_X$, whichever model you pick.
This is a "standardized" football that is aligned on the line y=x with a width such that the correlation is satisfied.
 
Last edited:
Re: Correlation?

I like Serena said:
The model is a positive sloped football that could be anywhere.
The problem puts the heart at x=4 with a variance of 4.
The y coordinate of the heart and the slope or can still be freely chosen.
Then, with the given correlation the "width" of the football becomes fixed.

Either way, when talking about the z-score of y, all these choices become moot, since they are standardized.
The relationship between $E(z_Y|z_X)$ and $z_X$ is simply $E(z_Y|z_X) = z_X$, whichever model you pick.
This is a "standardized" football that is aligned on the line y=x with a width such that the correlation is satisfied.

Since for Bivariate normal rv $X,\ Y$:

$$E(Y|X)=\rho\; \frac{\sigma_Y}{\sigma_X}Y$$

So as $z_X,\ z_Y$ have the same correlation coefficient as $X$ and $Y$ we have:

$$E(z_Y|z_X) = \rho\; z_X$$

See: http://athenasc.com/Bivariate-Normal.pdf.

... And simulation confirms this.

.
 
Last edited:

Similar threads

Replies
5
Views
2K
Replies
30
Views
4K
Replies
3
Views
2K
Replies
1
Views
1K
Replies
5
Views
11K
Replies
11
Views
4K
Back
Top