Checking if the residues are normal ad nauseum?

  • Thread starter nomadreid
  • Start date
  • Tags
    Normal
In summary: If I am checking whether my data fits a curve C1, I have to check to see whether the residues R1n are normally distributed, which is checking R1n against a normal curve C2, giving me residues R2m; which must be normally distributed, that is, must be checked against a normal curve C3, giving me residues R3p, and so on ad nauseum. Where does this end?The normal distribution of your residues is not necessary - if their mean is 0 and their standard deviation is 1, you are done. Any deviation from a normal distribution there would indicate some weird (non-gaussian) uncertainties for the individual data points.
  • #1
nomadreid
Gold Member
1,668
203
If I am checking whether my data fits a curve C1, I have to check to see whether the residues R1n are normally distributed, which is checking R1n against a normal curve C2, giving me residues R2m; which must be normally distributed, that is, must be checked against a normal curve C3, giving me residues R3p, and so on ad nauseum. Where does this end?
 
Physics news on Phys.org
  • #2
The normal distribution of your residues is not necessary - if their mean is 0 and their standard deviation is 1, you are done. Any deviation from a normal distribution there would indicate some weird (non-gaussian) uncertainties for the individual data points.
 
  • #3
nomadreid said:
If I am checking whether my data fits a curve C1, I have to check to see whether the residues R1n are normally distributed, which is checking R1n against a normal curve C2, giving me residues R2m; which must be normally distributed, that is, must be checked against a normal curve C3, giving me residues R3p, and so on ad nauseum. Where does this end?

In the first place, what do you mean when you say you are "checking"? You aren't describing a definite statistical test. I can appreciate your general train of thought. If there were some method of determining whether a given sample definitely did or did-not come from a normal distribution then a similar method could be applied to residues of plotting the histogram of the data vs the normal probability density. Then a similar method could also be applied to residues of the residues etc. However, there is no such fool proof method. All standard statistical hypothesis tests for normality compute is the probablity of certain aspects of the observed data given than we assume it came from a normal distribution. If you don't assume it came from a given distribuiton, you can't compute anything. (If this is upsetting, see Bayesian statistics.)

It is possible that you could invent a statistical hypothesis test based on residues-of-residues. To compare the utility of that test to the customary tests, people would look a the "power" of your test. The "power" of a test is complicated to define. It isn't a single number. It is a curve or surface that depends on how you parameterize the shape of the non-normal distributions that you consider.
 
  • #4
Thanks for the answers, mfb and Stephen Tashi. (Sorry for the delayed response.) Apparently statisticians rely quite a bit on "hm, looks OK". (I'm not at all a statistician, which you can certainly tell from my beginner's questions; I'm more used to those strange places in mathematics where correlation is a yes/no affair unless you are doing perturbation theory. On the other hand, prior assumptions are the heart and soul of mathematics: "Er, well, let's call (N, <) consistent, and have done with it."):smile:
More seriously: the statistical test I had in mind for the beginning set of points was the Pearson's correlation coefficient or something similar, where the residues should (I think) be more or less normally distributed, because otherwise (it appears at first glance at the formula) one could construct some wild mismatch between data and a line yet come up with a high r2. It might even not be too difficult to construct such with a 0 mean and sd=1. But as was pointed out, such a counter-example would probably look weird. (Something like Anscombe's quartet.) Or, to a blind computer, there would be other tests (which I haven't got to yet in my self-study of statistics) to check if it was weird. But then I was not sure about a test for the following steps to check data (residues) against normality; your answers indicate that there is none. Interesting.
 
Last edited:
  • #5
nomadreid said:
But then I was not sure about a test for the following steps to check data (residues) against normality; your answers indicate that there is none. Interesting.

Curve fitting falls under the statistical topic of "estimation". This is a distinct topic from "hypothesis testing", which involves procedures that specify yes-or-no decisions. So if your goal is find the best possible fit of curve to an empirical distribution you should approach it as a problem of estimation.

In the standard sort of statistics ("frequentist" statistics) people do sometimes employ several hypothesis tests to analyze data. (The Wikipedia has an article about this under the topic of "Multiple Comparisons", which I haven't read carefully.)

Applying statistics to real life data is a subjective matter. The nature of hypothesis testing is that it is a procedure for producing a decision, not a proof that the decision is correct. In most cases, all that can be quantified is the probability of making the wrong decision given that the "null hypothesis" is assumed to be correct. (From the point of view of a proof, if one assumes the null hypothesis is true then there is nothing to decide about whether it is true or not.)
 

1. What does it mean to check if the residues are normal ad nauseum?

Checking if the residues are normal ad nauseum refers to a statistical test used in scientific research to determine if the data collected follows a normal distribution. This test is important because many statistical methods and models assume that the data is normally distributed.

2. How is this test conducted?

The test involves plotting the data on a graph and visually assessing if it follows a bell-shaped curve. Additionally, statistical tests such as the Shapiro-Wilk test or the Kolmogorov-Smirnov test can be used to formally assess the normality of the data.

3. Why is it important to check for normality?

It is important to check for normality because many statistical methods, such as t-tests and ANOVA, assume that the data follows a normal distribution. If the data is not normally distributed, these methods may not provide accurate results.

4. What if the data is not normally distributed?

If the data is not normally distributed, there are several options to consider. One option is to transform the data using mathematical techniques to make it more normally distributed. Another option is to use non-parametric tests, which do not assume normality. It is also important to consider the research question and whether it is appropriate to use methods that assume normality.

5. Is it necessary to check for normality in all types of research?

No, it is not necessary to check for normality in all types of research. There are some statistical methods, such as correlation and regression, that do not assume normality. However, if the research question involves comparing groups or using statistical models that assume normality, it is important to check for normality.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
429
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
297
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
20K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
12
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
1K
Back
Top