Register to reply 
P value from z statistic 
Share this thread: 
#1
Apr2910, 08:56 AM

P: 80

Hey All,
Can someone please explain to me why the p value is obtained by taking the integral under the z curve from the z statistic you calculate to the end of the tail ? Thanks 


#2
Apr2910, 04:38 PM

P: 2,504

What you mean "why"? Do you already know the methodology? By the way, I don't think you mean "Z curve." The Z statistic is a measure of the distance between the mean and some point to the right or left on the X axis under the PDF (Bell Curve). It is calibrated in terms of the Standard Normal Distribution (SND) in Standard Deviation (SD) Units.
Do you know what a probability density or a probability density function (PDF) is? Sorry for the questions. I just don't want to try to explain things you already know. 


#3
Apr2910, 10:46 PM

P: 80

Hey SW VandeCarr,
yep I know what a pdf is  and I am comfortable with the idea that if you integrate between bounds of a pdf that it gives you the probability of your random variable being between those bounds. As I understood it you get this thing called a z statistic from that formula [tex] z = \frac{\bar{X}\mu}{\sigma} [/tex] and then you integrate from this value of z you work out to the tail (I guess here I am assuming a one sided test). I thought that what you were doing with the above z formula, was scaling your probability distribution to a gaussian function with mean mu and variance sigma, which we have tables for the integration bounds. I guess thinking about your questions about what I know, I don't really understand the connection between that z curve (if it is indeed called that) and the z statistic. Then I also don't understand why the p value, which as I understand it is the probability of getting the result you got IF the null hypothesis was true, is given by getting the area under a standard normal curve from the z statistic value to the tail. Thanks 


#4
Apr2910, 11:53 PM

P: 2,504

P value from z statistic
You probably know all this. What is key is that this is a model and it's reasonably useful for a lot of applications. The model can be adapted through transformations (semi log for instance) to skewed distributions. The Central Limit Theorem supports the concept that all kinds of populations which are amenable to good random sampling techniques will yield approximately normally distributed sample means. In any case there are plenty of other distribution models for special situations. In short, the [tex]\alpha[/tex] "p" value is the probability of the data under the null hypothesis given the model. This phrasing is often called "frequentist" interpretation of statistical inference.. *The Z score is not measured on a curve. It is measured on the x axis in SD units from the mean to a one sided limit of integration. The probability density is measured on the PDF between the limits of integration. EDIT: In terms of standard functional notation the Z score is x in SD units and F(x) is the p value at the limit of integration (x). Alpha is the difference between this p value and one. 


#5
May210, 12:39 AM

P: 12

I believe you also asked why we look at only the standard Normal curve. Statistically, the Normal distribution is a locationscale family. For any Normally distributed random variable, if you subtract off a constant, like [tex]\mu[/tex] (the population mean), and divide by a constant, [tex]\sigma[/tex] (the known population standard deviation), you will still have a Normally distributed random variable! Since the integral of the Normal density has no closed form, a table of areas under the curve to the left or right of z are given for the standard Normal distribution (N(0,1)) because you can transform any Normal random variable into a standard Normal random variable. Lastly, are you sure that [tex]\sigma[/tex] is known? This is a very strong assumption. If you are calculating pvalues for a hypothesis test of a single mean, or calculating a confidence interval for a population mean, [tex]\mu[/tex], then I recommend using a Tdistribution if you have a small sample size. I hope this helps! 


#6
May210, 01:06 AM

P: 12

Now, this discussion about using the CLT seems only relevant if the population standard deviation is known, and you are conducting either a hypothesis test or constructing a confidence interval for a population mean. For hypothesis tests, we assume that the true mean of the population is the value claimed under the null hypothesis. This is done so we may calculate a pvalue. Small pvalues indicate that we are far from this conjectured value (regardless of a twosided or onesided alternative hypothesis) and provide evidence that the true value of [tex]\mu[/tex] is not what we claim it to be under H_{0}. Frequentist statisticians believe that parameters are fixed, unknown quantities. This is why this is a "Frequentist" interpretation of a pvalue. Bayesians believe that parameter values are not fixed but are random and follow a some distribution. A further discussion of the Bayesian and Frequentist approaches is beyond the scope of a physics forum. 


#7
May310, 02:38 AM

P: 80

thanks guys  the statistics picture is starting to become clearer. just at a very core conceptual level can someone explain this to me:
I have a sample (lets say a collection of IQ tests in a suburb) and I want to test whether the mean IQ of these kids is higher than average. So H_0 = the mean IQ of these kids are the same as everywhere else H_1 the mean of theses kids IQ is different to everywhere else. And I set the significance level as 0.05 So I get my sample mean [tex] \bar{X} [/tex] and then since I know the population mean and variance I can compute a z statistic using: [tex] z = \frac{\bar{X}  \mu}{\sigma} [/tex] then I find the area under the standard normal function (which I have effectively transformed to) using the tables that exist, from the positive z value to the positive tail of the pdf and then do the same with the negative of the z value to the negative tail (obviously this is a two tailed test). this gives me the p value. Now I know the pvalue gives you the probability that the mean I found from my sample would occur if H_0 is true. I still don't really see why this is given by the area under the normal pdf described above. I know this is really elementary but I just can't seem to understand it conceptually for the life of me 


#8
May310, 04:01 PM

P: 2,504

1) If x corresponds to a value less then f(x)= 0.95, you fail to reject the null hypothesis. 2) If x corresponds to a value more then f(x)= 0.95 you reject the null hypothesis. Another way think of it is that if the value of 1f(x) is 0.05 or less, you reject the null hypothesis. That is, the difference between the observed and expected data under the null hypothesis will occur "randomly" under the normal assumption only 5% or less of the time. Therefore, you reject the null hypothesis that an intervention had no effect with alpha=0.05 in a controlled randomized trial. (Look up alpha error). The variable x is expressed in SD units from the mean. For f(x)=0.95 x=1.96 


#9
May310, 07:20 PM

P: 12




#10
May310, 09:04 PM

P: 2,504




#11
May310, 10:08 PM

P: 12

To truly understand why you use a standard Normal distribution to calculate pvalues relating to [tex]\bar{X}[/tex], you first must understand the concept of a sampling distribution. The sampling distribution of [tex]\bar{X}[/tex] is the distribution of values taken by the sample mean in all possible samples of the same size from the same population. For a sample of 100 students from Chicago, the sampling distribution of sample mean IQ scores would be the values of sample mean IQ scores taken in all unique samples of size 100 students from the population of students in Chicago. The number of unique samples of size 100 is really large, and you would need access to all student IQ scores in the Chicago student population. For these reasons, we approximate this distribution with a Normal curve because mathematical theory says we can for a large sample size and known variance (CLT).
Now, pvalues relate to ranges of values in a sampling distribution. To calculate a pvalue, a sampling distribution must first be defined. For [tex]\bar{X}[/tex], this is accomplished by assuming that the true mean of the Normal sampling distribution is the null hypothesized value (note that the standard deviation is known, so a sampling distribution is fully defined). Maybe you would hypothesize that Chicago's mean IQ score is 100. Next, you collect data to test this assumption. In your case, you might observe a sample mean IQ score of 90 for 100 Chicago students. The test statistic, correctly stated below, is then calculated. A z value that is far from 0 corresponds to a sample mean IQ score far from the hypothesized population mean of 100. Ranges of IQ scores are then considered in order to calculate a pvalue/integral/area under the curve. With a twosided alternative hypothesis, if you observed 90, the pvalue is the probability of observing a sample mean IQ score less than 90 or a sample mean IQ score more than 110. Statisticians justify looking at this range with the following logic: You observe a sample mean. What is the chance of seeing an even more extreme sample mean if your sampling distribution truly is centered at the hypothesized value? Here is the formal interpretation of a pvalue in your IQ example context: For a true mean IQ score of 100, the probability of observing a Chicago sample mean score as extreme or more extreme than 90 is 0.035. All this said, it should now be clear why the area in a standard Normal curve is of interest. The 90 and 110 IQ score sample means are standardized into zscores because you need them to calculate proportions of "extreme" sample means; this is synonymous to looking at extreme tail areas in your standard Normal distribution. I hope this addresses your concern. Note: Thrillhouse86, your test statistic regarding the sample mean from a population with known variance is wrong. It should be [tex] z = \frac{\bar{X}  \mu}{\sigma/\sqrt{n}} [/tex] 


#12
May310, 11:48 PM

P: 80

Thank you both d3t3rt & SW VandeCarr.
With my incorrect z score: am I incorrect because the quantity you divide the [tex] \bar{X}  \mu [/tex] by is not the standard deviation of the population, but the standard error of my sample mean ? 


#13
May410, 02:35 AM

P: 12




Register to reply 
Related Discussions  
Statistical analysis of chirps and temperature  Calculus & Beyond Homework  6  
Statistic Help THANKS  Calculus & Beyond Homework  0  
Statistic Help  General Math  1  
Pvaluehow to do? Statistic  Precalculus Mathematics Homework  3 