Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Checking if the residues are normal ad nauseum?

  1. Feb 23, 2013 #1
    If I am checking whether my data fits a curve C1, I have to check to see whether the residues R1n are normally distributed, which is checking R1n against a normal curve C2, giving me residues R2m; which must be normally distributed, that is, must be checked against a normal curve C3, giving me residues R3p, and so on ad nauseum. Where does this end?
  2. jcsd
  3. Feb 24, 2013 #2


    User Avatar
    2016 Award

    Staff: Mentor

    The normal distribution of your residues is not necessary - if their mean is 0 and their standard deviation is 1, you are done. Any deviation from a normal distribution there would indicate some weird (non-gaussian) uncertainties for the individual data points.
  4. Feb 24, 2013 #3

    Stephen Tashi

    User Avatar
    Science Advisor

    In the first place, what do you mean when you say you are "checking"? You aren't describing a definite statistical test. I can appreciate your general train of thought. If there were some method of determining whether a given sample definitely did or did-not come from a normal distribution then a similar method could be applied to residues of plotting the histogram of the data vs the normal probability density. Then a similar method could also be applied to residues of the residues etc. However, there is no such fool proof method. All standard statistical hypothesis tests for normality compute is the probablity of certain aspects of the observed data given than we assume it came from a normal distribution. If you don't assume it came from a given distribuiton, you can't compute anything. (If this is upsetting, see Bayesian statistics.)

    It is possible that you could invent a statistical hypothesis test based on residues-of-residues. To compare the utility of that test to the customary tests, people would look a the "power" of your test. The "power" of a test is complicated to define. It isn't a single number. It is a curve or surface that depends on how you parameterize the shape of the non-normal distributions that you consider.
  5. Feb 26, 2013 #4
    Thanks for the answers, mfb and Stephen Tashi. (Sorry for the delayed response.) Apparently statisticians rely quite a bit on "hm, looks OK". (I'm not at all a statistician, which you can certainly tell from my beginner's questions; I'm more used to those strange places in mathematics where correlation is a yes/no affair unless you are doing perturbation theory. On the other hand, prior assumptions are the heart and soul of mathematics: "Er, well, let's call (N, <) consistent, and have done with it."):smile:
    More seriously: the statistical test I had in mind for the beginning set of points was the Pearson's correlation coefficient or something similar, where the residues should (I think) be more or less normally distributed, because otherwise (it appears at first glance at the formula) one could construct some wild mismatch between data and a line yet come up with a high r2. It might even not be too difficult to construct such with a 0 mean and sd=1. But as was pointed out, such a counter-example would probably look weird. (Something like Anscombe's quartet.) Or, to a blind computer, there would be other tests (which I haven't got to yet in my self-study of statistics) to check if it was weird. But then I was not sure about a test for the following steps to check data (residues) against normality; your answers indicate that there is none. Interesting.
    Last edited: Feb 26, 2013
  6. Feb 26, 2013 #5

    Stephen Tashi

    User Avatar
    Science Advisor

    Curve fitting falls under the statistical topic of "estimation". This is a distinct topic from "hypothesis testing", which involves procedures that specify yes-or-no decisions. So if your goal is find the best possible fit of curve to an empirical distribution you should approach it as a problem of estimation.

    In the standard sort of statistics ("frequentist" statistics) people do sometimes employ several hypothesis tests to analyze data. (The Wikipedia has an article about this under the topic of "Multiple Comparisons", which I haven't read carefully.)

    Applying statistics to real life data is a subjective matter. The nature of hypothesis testing is that it is a procedure for producing a decision, not a proof that the decision is correct. In most cases, all that can be quantified is the probability of making the wrong decision given that the "null hypothesis" is assumed to be correct. (From the point of view of a proof, if one assumes the null hypothesis is true then there is nothing to decide about whether it is true or not.)
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Similar Discussions: Checking if the residues are normal ad nauseum?
  1. PRESS Residuals (Replies: 1)

  2. Quadratic Residues (Replies: 3)