T-test: normal probability plot

Click For Summary
SUMMARY

The discussion focuses on the use of normal probability plots in assessing the fit of data to a normal distribution, as outlined in Montgomery's "Design of Experiments." It emphasizes that if the plotted points on the normal probability plot deviate significantly from a straight line, the hypothesized normal distribution is not appropriate. The conversation also clarifies that the y-scale on these plots can vary depending on how the variables are scaled, and that a simple linear regression model assumes normally distributed residuals. The importance of visual inspection and the subjective nature of determining linearity in these plots is highlighted.

PREREQUISITES
  • Understanding of normal probability plots and their purpose in statistics
  • Familiarity with simple linear regression models and correlation coefficients
  • Knowledge of cumulative frequency and its representation in statistical graphs
  • Basic concepts of goodness of fit tests for statistical distributions
NEXT STEPS
  • Explore the implementation of normal probability plots using statistical software like R or Python's SciPy library
  • Learn about goodness of fit tests, specifically the Kolmogorov-Smirnov test and Anderson-Darling test
  • Investigate the implications of non-normality in data and alternative statistical models
  • Study the effects of scaling variables in regression analysis and its impact on model fit
USEFUL FOR

Statisticians, data analysts, and researchers who are involved in data analysis and model fitting, particularly those working with normal distributions and regression analysis.

serbring
Messages
267
Reaction score
2
I'm studying statistics from the book "design of experiments" by Montgomery and about the t-test it's stated it is necessary to check the samples are described by a normal distribution throughout a normal probability plot and I have noticed the y-scale is not familiar to me, it's neither linear of logaritmic. In the book is written:

the cumulative frequency scale has been arranged so that if the hypothesized distribution adequately describes the data, the plotted points will fall approximately along a straight line; if the plotted points deviate significantly from a straight line, the hypothesized model is not appropriate. Usually, the determination of
whether or not the data plot as a straight line is subjective.


How is the yscale chosen?



 
Physics news on Phys.org
Hey serbring.

You should picture a graph with your x and y data points where an average line that minimizes the sum of squared residuals is plotted. Some points will be above and others below.

If the sum of squared residuals is too large within some particular confidence measure, then what that means is that the correlation is too low and you can't use a simple linear fit to describe the variation present in the model itself.

When you fit a simple linear regression and test correlation, the correlation measure is actually the linear coefficient where y = cx + b and c is the correlation value. If you don't have a linear model then basically either your c is insignificant or you have to use a more complicated model to capture the variation of the data.

Testing whether a sample fits a distribution is usually done with goodness of fit or specific tests that look at specific distributions in one form or another.

Usually the scale depends on how you scale the variables themselves and without context it is hard to really evaluate.

In a simple linear model, the usual assumption is that if you have two sets of data Y and X (both random variables) where Y lies on the real line, then Y = a + bX + e where e is Normally distributed with 0 mean and some constant variance. This is the simplest regression model and is called a simple linear regression.
 
A normal probability plot is not a regression plot (by the way: in linear correlation the correlation IS NOT, in general, the slope in the equation y = cx +b).

I don't know what the graph you refer to looks like: a common way to create a normal probability plot is to arrange the Yi in order (smallest to largest) and plot them (on the horizontal axis). The vertical axis is often taken to be some representation of the percentiles of the standard normal distribution. If the actual percentiles are plotted then ordinary scales can be used: there are some software packages that use a different representation of the percentiles and they require different scales. As stated, without seeing the plot you reference it is impossible to state specifically what is going on in your book. If the points lie along a straight line you have evidence the "model" (the hypothesized normal distribution for your data) is a good fit (no regression involved). Note that it is very common for these plots to show a strong linear pattern in the center of the graph but have the points stray from the line in the extremes: that simply reflects the fact that data often "appears normal" in the middle of the distribution but deviate from normality in the tails.

A short but readable discussion of normal probability plots can be found here.
http://www.statit.com/support/quality_practice_tips/testingfornearnormality.shtml
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 24 ·
Replies
24
Views
4K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 27 ·
Replies
27
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K