# T-test: normal probability plot

I'm studying statistics from the book "design of experiments" by Montgomery and about the t-test it's stated it is necessary to check the samples are described by a normal distribution throughout a normal probability plot and I have noticed the y-scale is not familiar to me, it's neither linear of logaritmic. In the book is written:

the cumulative frequency scale has been arranged so that if the hypothesized distribution adequately describes the data, the plotted points will fall approximately along a straight line; if the plotted points deviate signiﬁcantly from a straight line, the hypothesized model is not appropriate. Usually, the determination of
whether or not the data plot as a straight line is subjective.

How is the yscale chosen?

Related Set Theory, Logic, Probability, Statistics News on Phys.org
chiro
Hey serbring.

You should picture a graph with your x and y data points where an average line that minimizes the sum of squared residuals is plotted. Some points will be above and others below.

If the sum of squared residuals is too large within some particular confidence measure, then what that means is that the correlation is too low and you can't use a simple linear fit to describe the variation present in the model itself.

When you fit a simple linear regression and test correlation, the correlation measure is actually the linear coefficient where y = cx + b and c is the correlation value. If you don't have a linear model then basically either your c is insignificant or you have to use a more complicated model to capture the variation of the data.

Testing whether a sample fits a distribution is usually done with goodness of fit or specific tests that look at specific distributions in one form or another.

Usually the scale depends on how you scale the variables themselves and without context it is hard to really evaluate.

In a simple linear model, the usual assumption is that if you have two sets of data Y and X (both random variables) where Y lies on the real line, then Y = a + bX + e where e is Normally distributed with 0 mean and some constant variance. This is the simplest regression model and is called a simple linear regression.