- #1

- 80

- 0

Can someone please explain to me why the p value is obtained by taking the integral under the z curve from the z statistic you calculate to the end of the tail ?

Thanks

You are using an out of date browser. It may not display this or other websites correctly.

You should upgrade or use an alternative browser.

You should upgrade or use an alternative browser.

- Thread starter thrillhouse86
- Start date

- #1

- 80

- 0

Can someone please explain to me why the p value is obtained by taking the integral under the z curve from the z statistic you calculate to the end of the tail ?

Thanks

- #2

- 2,161

- 79

What you mean "why"? Do you already know the methodology? By the way, I don't think you mean "Z curve." The Z statistic is a measure of the distance between the mean and some point to the right or left on the X axis under the PDF (Bell Curve). It is calibrated in terms of the Standard Normal Distribution (SND) in Standard Deviation (SD) Units.

Do you know what a probability density or a probability density function (PDF) is?

Sorry for the questions. I just don't want to try to explain things you already know.

Do you know what a probability density or a probability density function (PDF) is?

Sorry for the questions. I just don't want to try to explain things you already know.

Last edited:

- #3

- 80

- 0

yep I know what a pdf is - and I am comfortable with the idea that if you integrate between bounds of a pdf that it gives you the probability of your random variable being between those bounds.

As I understood it you get this thing called a z statistic from that formula

[tex]

z = \frac{\bar{X}-\mu}{\sigma}

[/tex]

and then you integrate from this value of z you work out to the tail (I guess here I am assuming a one sided test).

I thought that what you were doing with the above z formula, was scaling your probability distribution to a gaussian function with mean mu and variance sigma, which we have tables for the integration bounds.

I guess thinking about your questions about what I know, I don't really understand the connection between that z curve (if it is indeed called that) and the z statistic.

Then I also don't understand why the p value, which as I understand it is the probability of getting the result you got IF the null hypothesis was true, is given by getting the area under a standard normal curve from the z statistic value to the tail.

Thanks

- #4

- 2,161

- 79

yep I know what a pdf is - and I am comfortable with the idea that if you integrate between bounds of a pdf that it gives you the probability of your random variable being between those bounds.

As I understood it you get this thing called a z statistic from that formula

[tex]

z = \frac{\bar{X}-\mu}{\sigma}

[/tex]

and then you integrate from this value of z you work out to the tail (I guess here I am assuming a one sided test).

I thought that what you were doing with the above z formula, was scaling your probability distribution to a gaussian function with mean mu and variance sigma, which we have tables for the integration bounds.

I guess thinking about your questions about what I know, I don't really understand the connection between that z curve (if it is indeed called that) and the z statistic.

Then I also don't understand why the p value, which as I understand it is the probability of getting the result you got IF the null hypothesis was true, is given by getting the area under a standard normal curve from the z statistic value to the tail.

Thanks

The correct language is "rejecting or failing to reject" the null hypothesis. You probably know that you select some value [tex]\alpha[/tex] such that [tex]p=1-\alpha[/tex] is the upper limit of integration for the one tailed test. Any value of a test statistic that falls within this range is the basis for "failing to reject" the null hypothesis. Typically[tex] \alpha = 0.05. or 0.025[/tex]. The Z score* for this limit is 1.96 SD which corresponds to the 0.95 probability density value of the integral of the PDF. Any value outside this range (Z>1.96) is the basis for "rejecting the null hypothesis" with alpha=0.05,

You probably know all this. What is key is that this is a

In short, the [tex]\alpha[/tex] "p" value is the probability of the data under the null hypothesis given the model. This phrasing is often called "frequentist" interpretation of statistical inference..

*The Z score is not measured on a curve. It is measured on the x axis in SD units from the mean to a one sided limit of integration. The probability density is measured on the PDF between the limits of integration.

EDIT: In terms of standard functional notation the Z score is x in SD units and F(x) is the p value at the limit of integration (x). Alpha is the difference between this p value and one.

Last edited:

- #5

- 12

- 0

Can someone please explain to me why the p value is obtained by taking the integral under the z curve from the z statistic you calculate to the end of the tail ?

Thanks

The standard Normal curve is symmetric and the area underneath

I believe you also asked why we look at only the standard Normal curve. Statistically, the Normal distribution is a location-scale family. For

Lastly, are you sure that [tex]\sigma[/tex] is known? This is a very strong assumption. If you are calculating p-values for a hypothesis test of a single mean, or calculating a confidence interval for a population mean, [tex]\mu[/tex], then I recommend using a T-distribution if you have a small sample size. I hope this helps!

- #6

- 12

- 0

The correct language is "rejecting or failing to reject" the null hypothesis. You probably know that you select some value [tex]\alpha[/tex] such that [tex]p=1-\alpha[/tex] is the upper limit of integration for the one tailed test. Any value of a test statistic that falls within this range is the basis for "failing to reject" the null hypothesis. Typically[tex] \alpha = 0.05. or 0.025[/tex]. The Z score* for this limit is 1.96 SD which corresponds to the 0.95 probability density value of the integral of the PDF. Any value outside this range (Z>1.96) is the basis for "rejecting the null hypothesis" with alpha=0.05,

You probably know all this. What is key is that this is amodeland it's reasonably useful for a lot of applications. The model can be adapted through transformations (semi log for instance) to skewed distributions. The Central Limit Theorem supports the concept that all kinds of populations which are amenable to good random sampling techniques will yield approximately normally distributed sample means. In any case there are plenty of other distribution models for special situations.

In short, the [tex]\alpha[/tex] "p" value is the probability of the data under the null hypothesis given the model. This phrasing is often called "frequentist" interpretation of statistical inference..

*The Z score is not measured on a curve. It is measured on the x axis in SD units from the mean to a one sided limit of integration. The probability density is measured on the PDF between the limits of integration.

EDIT: In terms of standard functional notation the Z score is x in SD units and F(x) is the p value at the limit of integration (x). Alpha is the difference between this p value and one.

Formally, the Central Limit Theorem states for

Now, this discussion about using the CLT seems only relevant if the population standard deviation is known, and you are conducting either a hypothesis test or constructing a confidence interval for a population mean. For hypothesis tests, we assume that the true mean of the population is the value claimed under the null hypothesis. This is done so we may calculate a p-value. Small p-values indicate that we are far from this conjectured value (regardless of a two-sided or one-sided alternative hypothesis) and provide evidence that the true value of [tex]\mu[/tex] is

- #7

- 80

- 0

I have a sample (lets say a collection of IQ tests in a suburb) and I want to test whether the mean IQ of these kids is higher than average. So

H_0 = the mean IQ of these kids are the same as everywhere else

H_1 the mean of theses kids IQ is different to everywhere else.

And I set the significance level as 0.05

So I get my sample mean [tex] \bar{X} [/tex] and then since I know the population mean and variance I can compute a z statistic using:

[tex] z = \frac{\bar{X} - \mu}{\sigma} [/tex]

then I find the area under the standard normal function (which I have effectively transformed to) using the tables that exist, from the positive z value to the positive tail of the pdf and then do the same with the negative of the z value to the negative tail (obviously this is a two tailed test). this gives me the p value.

Now I know the pvalue gives you the probability that the mean I found from my sample would occur if H_0 is true. I still don't really see why this is given by the area under the normal pdf described above.

I know this is really elementary but I just can't seem to understand it conceptually for the life of me

- #8

- 2,161

- 79

thanks guys - the statistics picture is starting to become clearer. just at a very core conceptual level can someone explain this to me:

Now I know the pvalue gives you the probability that the mean I found from my sample would occur if H_0 is true. I still don't really see why this is given by the area under the normal pdf described above.

I know this is really elementary but I just can't seem to understand it conceptually for the life of me

OK. I'll try. Let's use the one tailed test for illustration. You know alpha is the probability of the data under the null hypothesis given the model. The limit of integration for the SND PDF for a one tailed test is [tex]1-\alpha[/tex]. If alpha is 0.05 then the integral f(x) is 0.95.

1) If x corresponds to a value less then f(x)= 0.95, you fail to reject the null hypothesis.

2) If x corresponds to a value more then f(x)= 0.95 you reject the null hypothesis.

Another way think of it is that if the value of 1-f(x) is 0.05 or less, you reject the null hypothesis. That is, the difference between the observed and expected data under the null hypothesis will occur "randomly" under the normal assumption only 5% or less of the time. Therefore, you reject the null hypothesis that an intervention had no effect with alpha=0.05 in a controlled randomized trial. (Look up alpha error).

The variable x is expressed in SD units from the mean. For f(x)=0.95 x=1.96

Last edited:

- #9

- 12

- 0

The variable x is expressed in SD units from the mean. For f(x)=0.95 x=1.96

For a one-sided test with an alternative hypothesis that [tex]\mu[/tex] is greater than some hypothesized value, the critical value, x in this case, is 1.645.

- #10

- 2,161

- 79

For a one-sided test with an alternative hypothesis that [tex]\mu[/tex] is greater than some hypothesized value, the critical value, x in this case, is 1.645.

Right. I'm used to always doing two sided tests.

- #11

- 12

- 0

Now, p-values relate to

Here is the formal interpretation of a p-value in your IQ example context:

For a true mean IQ score of 100, the probability of observing a Chicago sample mean score

All this said, it should now be clear why the area in a standard Normal curve is of interest. The 90 and 110 IQ score sample means are standardized into z-scores because you need them to calculate proportions of "extreme" sample means; this is synonymous to looking at extreme tail areas in your standard Normal distribution.

I hope this addresses your concern.

Note: Thrillhouse86, your test statistic regarding the sample mean from a population with known variance is wrong. It should be

[tex]

z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}}

[/tex]

- #12

- 80

- 0

With my incorrect z score: am I incorrect because the quantity you divide the [tex] \bar{X} - \mu [/tex] by is not the standard deviation of the population, but the standard error of my sample mean ?

- #13

- 12

- 0

With my incorrect z score: am I incorrect because the quantity you divide the [tex] \bar{X} - \mu [/tex] by is not the standard deviation of the population, but the standard error of my sample mean ?

Yes, you need to divide by the standard error.

Share:

- Replies
- 1

- Views
- 2K