Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

T-test: normal probability plot

  1. Nov 9, 2014 #1
    I'm studying statistics from the book "design of experiments" by Montgomery and about the t-test it's stated it is necessary to check the samples are described by a normal distribution throughout a normal probability plot and I have noticed the y-scale is not familiar to me, it's neither linear of logaritmic. In the book is written:

    the cumulative frequency scale has been arranged so that if the hypothesized distribution adequately describes the data, the plotted points will fall approximately along a straight line; if the plotted points deviate signi´Čücantly from a straight line, the hypothesized model is not appropriate. Usually, the determination of
    whether or not the data plot as a straight line is subjective.

    How is the yscale chosen?

  2. jcsd
  3. Nov 12, 2014 #2


    User Avatar
    Science Advisor

    Hey serbring.

    You should picture a graph with your x and y data points where an average line that minimizes the sum of squared residuals is plotted. Some points will be above and others below.

    If the sum of squared residuals is too large within some particular confidence measure, then what that means is that the correlation is too low and you can't use a simple linear fit to describe the variation present in the model itself.

    When you fit a simple linear regression and test correlation, the correlation measure is actually the linear coefficient where y = cx + b and c is the correlation value. If you don't have a linear model then basically either your c is insignificant or you have to use a more complicated model to capture the variation of the data.

    Testing whether a sample fits a distribution is usually done with goodness of fit or specific tests that look at specific distributions in one form or another.

    Usually the scale depends on how you scale the variables themselves and without context it is hard to really evaluate.

    In a simple linear model, the usual assumption is that if you have two sets of data Y and X (both random variables) where Y lies on the real line, then Y = a + bX + e where e is Normally distributed with 0 mean and some constant variance. This is the simplest regression model and is called a simple linear regression.
  4. Nov 12, 2014 #3


    User Avatar
    Homework Helper

    A normal probability plot is not a regression plot (by the way: in linear correlation the correlation IS NOT, in general, the slope in the equation y = cx +b).

    I don't know what the graph you refer to looks like: a common way to create a normal probability plot is to arrange the Yi in order (smallest to largest) and plot them (on the horizontal axis). The vertical axis is often taken to be some representation of the percentiles of the standard normal distribution. If the actual percentiles are plotted then ordinary scales can be used: there are some software packages that use a different representation of the percentiles and they require different scales. As stated, without seeing the plot you reference it is impossible to state specifically what is going on in your book. If the points lie along a straight line you have evidence the "model" (the hypothesized normal distribution for your data) is a good fit (no regression involved). Note that it is very common for these plots to show a strong linear pattern in the center of the graph but have the points stray from the line in the extremes: that simply reflects the fact that data often "appears normal" in the middle of the distribution but deviate from normality in the tails.

    A short but readable discussion of normal probability plots can be found here.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook