Checking if the residues are normal ad nauseum?

  • Context: Graduate 
  • Thread starter Thread starter nomadreid
  • Start date Start date
  • Tags Tags
    Normal
Click For Summary

Discussion Overview

The discussion revolves around the process of checking the normality of residuals when fitting data to a curve. Participants explore the implications of normal distribution in statistical analysis, the necessity of such checks, and the potential for developing new statistical tests.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant questions the necessity of checking if residuals are normally distributed, suggesting that having a mean of 0 and standard deviation of 1 may suffice.
  • Another participant emphasizes the lack of a definitive statistical test for normality, noting that all standard tests only provide probabilities based on the assumption of normal distribution.
  • A participant expresses confusion about the process of checking residuals against normality and suggests that statistical tests like Pearson's correlation coefficient could be relevant.
  • There is mention of the subjective nature of applying statistics to real-life data and the distinction between estimation and hypothesis testing in statistical analysis.
  • Concerns are raised about the potential for misleading results if residuals are not normally distributed, referencing examples like Anscombe's quartet.

Areas of Agreement / Disagreement

Participants express differing views on the necessity and methodology of checking residuals for normality. There is no consensus on whether such checks are essential or how they should be conducted.

Contextual Notes

Participants highlight limitations in existing methods for testing normality and the complexity of defining the "power" of statistical tests. The discussion also touches on the subjective nature of statistical analysis in practical applications.

nomadreid
Gold Member
Messages
1,773
Reaction score
256
If I am checking whether my data fits a curve C1, I have to check to see whether the residues R1n are normally distributed, which is checking R1n against a normal curve C2, giving me residues R2m; which must be normally distributed, that is, must be checked against a normal curve C3, giving me residues R3p, and so on ad nauseum. Where does this end?
 
Physics news on Phys.org
The normal distribution of your residues is not necessary - if their mean is 0 and their standard deviation is 1, you are done. Any deviation from a normal distribution there would indicate some weird (non-gaussian) uncertainties for the individual data points.
 
nomadreid said:
If I am checking whether my data fits a curve C1, I have to check to see whether the residues R1n are normally distributed, which is checking R1n against a normal curve C2, giving me residues R2m; which must be normally distributed, that is, must be checked against a normal curve C3, giving me residues R3p, and so on ad nauseum. Where does this end?

In the first place, what do you mean when you say you are "checking"? You aren't describing a definite statistical test. I can appreciate your general train of thought. If there were some method of determining whether a given sample definitely did or did-not come from a normal distribution then a similar method could be applied to residues of plotting the histogram of the data vs the normal probability density. Then a similar method could also be applied to residues of the residues etc. However, there is no such fool proof method. All standard statistical hypothesis tests for normality compute is the probability of certain aspects of the observed data given than we assume it came from a normal distribution. If you don't assume it came from a given distribuiton, you can't compute anything. (If this is upsetting, see Bayesian statistics.)

It is possible that you could invent a statistical hypothesis test based on residues-of-residues. To compare the utility of that test to the customary tests, people would look a the "power" of your test. The "power" of a test is complicated to define. It isn't a single number. It is a curve or surface that depends on how you parameterize the shape of the non-normal distributions that you consider.
 
Thanks for the answers, mfb and Stephen Tashi. (Sorry for the delayed response.) Apparently statisticians rely quite a bit on "hm, looks OK". (I'm not at all a statistician, which you can certainly tell from my beginner's questions; I'm more used to those strange places in mathematics where correlation is a yes/no affair unless you are doing perturbation theory. On the other hand, prior assumptions are the heart and soul of mathematics: "Er, well, let's call (N, <) consistent, and have done with it."):smile:
More seriously: the statistical test I had in mind for the beginning set of points was the Pearson's correlation coefficient or something similar, where the residues should (I think) be more or less normally distributed, because otherwise (it appears at first glance at the formula) one could construct some wild mismatch between data and a line yet come up with a high r2. It might even not be too difficult to construct such with a 0 mean and sd=1. But as was pointed out, such a counter-example would probably look weird. (Something like Anscombe's quartet.) Or, to a blind computer, there would be other tests (which I haven't got to yet in my self-study of statistics) to check if it was weird. But then I was not sure about a test for the following steps to check data (residues) against normality; your answers indicate that there is none. Interesting.
 
Last edited:
nomadreid said:
But then I was not sure about a test for the following steps to check data (residues) against normality; your answers indicate that there is none. Interesting.

Curve fitting falls under the statistical topic of "estimation". This is a distinct topic from "hypothesis testing", which involves procedures that specify yes-or-no decisions. So if your goal is find the best possible fit of curve to an empirical distribution you should approach it as a problem of estimation.

In the standard sort of statistics ("frequentist" statistics) people do sometimes employ several hypothesis tests to analyze data. (The Wikipedia has an article about this under the topic of "Multiple Comparisons", which I haven't read carefully.)

Applying statistics to real life data is a subjective matter. The nature of hypothesis testing is that it is a procedure for producing a decision, not a proof that the decision is correct. In most cases, all that can be quantified is the probability of making the wrong decision given that the "null hypothesis" is assumed to be correct. (From the point of view of a proof, if one assumes the null hypothesis is true then there is nothing to decide about whether it is true or not.)
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
1K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 24 ·
Replies
24
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
1K
  • · Replies 11 ·
Replies
11
Views
4K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 6 ·
Replies
6
Views
24K