Predict Z-Score for Y Given X at 30th Percentile

  • Context: MHB 
  • Thread starter Thread starter dlee
  • Start date Start date
Click For Summary

Discussion Overview

The discussion centers on predicting the z-score for a random variable Y given that another random variable X is at the 30th percentile, with a specified correlation of ρ = 0.7. The context involves assumptions of a bivariate normal distribution and the implications of these assumptions on the relationship between X and Y.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants propose that using the relationship derived from the bivariate normal distribution, the z-score for Y can be calculated as a function of the z-score for X, specifically suggesting that $z_Y = \text{sgn}(\rho) z_X$.
  • Others argue that with X at the 30th percentile, the corresponding z-score $z_X$ is calculated as -0.524, leading to a derived z-score for Y of -0.524, which does not match the expected solution of -0.364.
  • A later reply questions the validity of the derived z-score for Y, suggesting that the correlation and the model setup may not align correctly with the problem's requirements.
  • Some participants mention that simplifying assumptions, such as setting means to zero, could lead to different interpretations of the correlation and its implications on the z-scores.
  • There is a discussion about the nature of the joint PMF being "football shaped," indicating a specific correlation structure that may influence the predictions.
  • One participant notes that the relationship between the expected values of z-scores can be expressed as $E(z_Y|z_X) = \rho z_X$, which is supported by simulation results.

Areas of Agreement / Disagreement

Participants express disagreement regarding the correct z-score for Y, with some asserting that the derived value of -0.524 is accurate, while others challenge this conclusion and suggest alternative interpretations and calculations. No consensus is reached on the correct approach or final value.

Contextual Notes

There are limitations regarding the assumptions made about the means and variances of the random variables, as well as the implications of the correlation on the derived z-scores. The discussion reflects varying interpretations of the model and its parameters.

dlee
Messages
4
Reaction score
0
Consider two random variables X,Y whose correlation is ρ = 0.7 (and the joint PMF is football shaped). Predict the z-score for Y if you observe that X is at the 30th percentile (assuming X ~ N(4,4)).

The solution to this problem is -0.364, but I'm not sure how to approach this answer.
 
Physics news on Phys.org
Re: Correlation?

dlee said:
Consider two random variables X,Y whose correlation is ρ = 0.7 (and the joint PMF is football shaped). Predict the z-score for Y if you observe that X is at the 30th percentile (assuming X ~ N(4,4)).

The solution to this problem is -0.364, but I'm not sure how to approach this answer.

I we assume a bivariate normal distribution, we "expect" the relation:
$$y(x) = \text{sgn}(\rho) \frac {\sigma_Y}{\sigma_X} (x - \mu_X) + \mu_Y$$

With X at the 30th percentile, that means $z_X = \frac{x - \mu_X}{\sigma_X} = \text{invNorm}(0.30) = -0.524$.

In other words, the z-score for Y is
$$z_Y = \frac{y - \mu_Y}{\sigma_Y} = \text{sgn}(\rho) z_X = -0.524$$

I don't know how they got to -0.364.
 
Re: Correlation?

I like Serena said:
I we assume a bivariate normal distribution, we "expect" the relation:
$$y(x) = \text{sgn}(\rho) \frac {\sigma_Y}{\sigma_X} (x - \mu_X) + \mu_Y$$

With X at the 30th percentile, that means $z_X = \frac{x - \mu_X}{\sigma_X} = \text{invNorm}(0.30) = -0.524$.

In other words, the z-score for Y is
$$z_Y = \frac{y - \mu_Y}{\sigma_Y} = \text{sgn}(\rho) z_X = -0.524$$

I don't know how they got to -0.364.

That can't be right.

You can without loss of generality assume $$\mu_X = \mu_Y = 0$$, so we have a model:

$$y=\alpha x$$

then $\displaystyle \sigma_Y=\alpha\; \sigma_X$, and $\rho=E(XY)/(\sigma_X \sigma_Y)=\alpha\; \sigma_X/\sigma_Y$

Hence: $$\alpha=\rho \frac{\sigma_Y}{\sigma_X}$$...

.
 
Re: Correlation?

zzephod said:
That can't be right.

You can without loss of generality assume $$\mu_X = \mu_Y = 0$$

I didn't.
The problem asks for a z-score, meaning $\mu_X$, and $\mu_Y$ get eliminated (see my derivation).

so we have a model:

$$y=\alpha x$$

then $\displaystyle \sigma_Y=\alpha\; \sigma_X$, and $\rho=E(XY)/(\sigma_X \sigma_Y)=\alpha\; \sigma_X/\sigma_Y$

Hence: $$\alpha=\rho \frac{\sigma_Y}{\sigma_X}$$...

Well... multiplying by 0.7 almost gives the requested result.
But that won't be right.
 
Re: Correlation?

I like Serena said:
... Well... multiplying by 0.7 almost gives the requested result.
But that won't be right.

It will be if you use "nearest value" in inverse normal lookup in a table.

.
 
Last edited:
Re: Correlation?

I like Serena said:
I didn't.
The problem asks for a z-score, meaning $\mu_X$, and $\mu_Y$ get eliminated (see my derivation).

Well, since you failed to set up a model with the correct correlation it is not irrelevant to make an observation that simplifies setting the correlation without changing the answer.

.
 
Last edited:
Re: Correlation?

zzephod said:
Well, since you failed to set up a model with the correct correlation it is not irrelevant to make an observation that simplifies setting the correlation without changing the answer.

.

The model is a positive sloped football that could be anywhere.
The problem puts the heart at x=4 with a variance of 4.
The y coordinate of the heart and the slope can still be freely chosen.
Then, with the given correlation, the "width" of the football becomes fixed.

Either way, when talking about the z-score of y, all these choices become moot, since they are standardized.
The relationship between $E(z_Y|z_X)$ and $z_X$ is simply $E(z_Y|z_X) = z_X$, whichever model you pick.
This is a "standardized" football that is aligned on the line y=x with a width such that the correlation is satisfied.
 
Last edited:
Re: Correlation?

I like Serena said:
The model is a positive sloped football that could be anywhere.
The problem puts the heart at x=4 with a variance of 4.
The y coordinate of the heart and the slope or can still be freely chosen.
Then, with the given correlation the "width" of the football becomes fixed.

Either way, when talking about the z-score of y, all these choices become moot, since they are standardized.
The relationship between $E(z_Y|z_X)$ and $z_X$ is simply $E(z_Y|z_X) = z_X$, whichever model you pick.
This is a "standardized" football that is aligned on the line y=x with a width such that the correlation is satisfied.

Since for Bivariate normal rv $X,\ Y$:

$$E(Y|X)=\rho\; \frac{\sigma_Y}{\sigma_X}Y$$

So as $z_X,\ z_Y$ have the same correlation coefficient as $X$ and $Y$ we have:

$$E(z_Y|z_X) = \rho\; z_X$$

See: http://athenasc.com/Bivariate-Normal.pdf.

... And simulation confirms this.

.
 
Last edited:

Similar threads

  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 11 ·
Replies
11
Views
2K
Replies
8
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 5 ·
Replies
5
Views
11K
Replies
5
Views
3K
  • · Replies 14 ·
Replies
14
Views
4K