Estimating Variance of Normal distribution.

1. Jul 29, 2012

steviekm3

Suppose we have a normal distribution and a sample of n values from the normal distribution.

To estimate the variance we can use the standard sample variance formula ( average squared distance from the mean divided by either n ( biased estimator ) or n-1 ( unbiased estimator ) ).

There is another property about the normal distribution that possibly can be used to estimate variance and that is the property that the mean absolute deviation from the mean =
sqrt(2/pi) * std deviation

What I was wondering is that is it possible to calculate the sample mean absolute deviation from the sample mean and then divide this by sqrt(2/pi) to get an estimate for the standard deviation ? If so how does it compare with the regular formulas for estimating std deviation ?

2. Jul 29, 2012

Number Nine

All you're doing is using the standard deviation of the sample to estimate the standard deviation of the population (sqrt(2/pi) * std deviation divided by sqrt(2/Pi) is just the standard deviation), and then, presumably, squaring it to get your estimate of the variance. So, you're comparing the standard variance estimate with (n-1) in the denominator with the computation of the sample variance, which uses n in the denominator. The difference is that your formula is a biased estimate, which means that it will systematically deviate from the true value in some direction, though the difference will be negligible for large samples.

I ran a simulation just to double check, and your estimate systematically underestimates the population variance.

3. Jul 29, 2012

steviekm3

I don't think my calculation uses the sample standard deviation formula.

For example suppose these are the numbers in series:

1,
2,
3

I would find mean of these which is 2.

and then find average of |1-2|,|2-2|,|3-2| which is 2/3

Then the estimate for std deviation of population would be (2/3)/sqrt(pi/2)

4. Jul 29, 2012

Number Nine

We already have a minimum-variance unbiased estimator for the variance or a normal population, so there's really no need to use anything else in most situations anyway.

5. Jul 29, 2012

steviekm3

I ran some simulations as well and I cannot see the radical underestimation of variance of population. The term radical is subjective anyhow, do you have a more quantifiable description ?

6. Jul 29, 2012

Number Nine

The first, and most (and, really, only) important "quantifiable" description is that your estimate of the population variance is further away from the true variance than the usual estimator (sample variance with n-1 in the denominator). To give you an idea of the magnitude of the error, I drew one thousand samples of 100 from a standard normal distribution and computed the average estimate for both our estimators. Yours estimates the population SD to be 0.636, whereas the standard estimator comes out at 0.999 (1 is the correct value). More importantly, your estimate doesn't seem to converge to the true estimate, which makes it biased (it systematically deviates from the true value).

We're actually being fairly "un-rigorous" here, since estimating the population SD is fairly complicated. We have a very good (the best possible) estimator for the variance of a normal population (the usual formula, with n-1 in the denominator), but the square-root of this value is not a great estimator of the SD (though, it's pretty good in some cases).

Last edited: Jul 29, 2012
7. Jul 29, 2012

chiro

Hey steviekm3 and welcome to the forums.

Are you aware of the estimators used (in particular MLE) for the variance and also the properties of a good estimator (unbiased, consistent)? Also are you aware of the criteria for the best estimator (Fischer Information)?

All of these characteristics are used to not only derive an estimator, but show that under the Information criterion, that an estimator is 'optimal'.

8. Jul 29, 2012

Stephen Tashi

An interesting article on the web discusses the relative merits of the sample mean deviation vs the sample standard deviation (as estimators for their respective population parameters)http://www.leeds.ac.uk/educol/documents/00003759.htm. It gives some arguments in favor of using the mean absolute deviation when the distribution is NOT perfectly Gaussian.

(If we are going to get into dueling simulations, it would be useful if each party states whether his simulation samples from a Gaussian or some other distsribution. On a computer a nominal Gaussian this will actually be a discrete version of a truncated Gaussian.)

As Chiro has hinted, to compare formulas for estimators one needs to specify what is being compared. (Interestingly, there is no estimator for the variance of a Gaussian that is "best" by all the usual criteria for comparison. Virtualtux points this out in post #12 of the thread https://www.physicsforums.com/showthread.php?t=616643.

So far, nobody in this thread has been able to answer your question with respect to any of the well know criteria and I can't either. To make such a comparison, we also have to be specific about whether the goal is to estimate the variance or whether it is to estimate the standard deviation. - or whether the goal is estimate the distribution itself - i.e. to estimate it as a function by some of the cirtiera that are used to measure how well one function approximates another.

9. Jul 30, 2012

Number Nine

We've been discussing normal populations explicitly, so I didn't bother outlining the procedure. However, in the interest of transparency:

10000 samples of size 50 were drawn from a standard normal distribution and the population variance was estimated using the standard unbiased estimate, the OP's estimator, and the LSE and MLE (because the thread you linked to was interesting; clearly, I was wrong about the standard unbiased estimator being the best possible). The mean and variance of each estimate was as follows...

Unbiased
Mean: 0.9986
Variance: 0.0407

OP's Estimator
Mean: 0.4074
Variance: 0.0075 (!)

LSE
Mean: 0.9786
Variance: 0.0394

MLE
Mean: 0.9594
Variance: 0.0376

Mind you, this is using the square of the OP's estimate as an estimate of the variance (so that everything is estimating the same statistic). If we instead use the square root of variance estimator as an estimate of the SD (which is not ideal either, but I think is what the OP was suggesting; part of the problem is that he's comparing his estimator of the SD to an estimator for the variance), we get...

Unbiased
Mean: 0.9956
Variance: 0.0102

OP's Estimator
Mean: 0.6306
Variance: 0.0046 (!)

LSE
Mean: 0.9759
Variance: 0.0098

MLE
Mean: 0.9856
Variance: 0.0100

10. Aug 1, 2012

steviekm3

Okay I got some more time to work on this. What I found out is I believe that a correction factor has to be added to the estimator. When I add in this correction factor the estimator should be an unbiased estimator of the standard deviation.

Here is code that compares standard estimator ( take sqrt of S^2 ) with this estimator. I get average estimator values to be around 0.999 then. The standard estimator is not as close because it is biased. But I believe adjustment factor ( of different form ) can fix standard estimator. I have not looked into how fast they converge but I'll work on this next. Note standard estimator for variance is not biased. I don't think squaring this new estimator will produce an unbiased estimator but I have to look more closely ( Jensen's inequality ).

Note be careful with coding as at first I must have had something wrong in formula as I got around 0.63 for std dev which was similar to what you got.

double totalAvg=0.0;
double totalAvg2=0.0;
size_t totalIterations=100000;
size_t sampleSize=20;
for( size_t i = 0; i < totalIterations; ++i )
{
std::vector<double> rns;
for( size_t j = 0; j < sampleSize; ++j )
{
double rn=gsl_ran_gaussian(r,1.0);// this is box-muller algorithm to find randome number from N(0,1)
rns.push_back(rn);
}
double stdStdDev=CalculateStdDev(rns.begin(),rns.end()); // this is regular std deviation estimator
double mean=CalculateMean(rns.begin(),rns.end());
double totalAbs=0;
for(size_t j =0; j < sampleSize; ++j )
{
totalAbs += fabs(rns[j]-mean);
}
double correctionFactor=(sampleSize-1)/sampleSize;
double pi=3.1459;
double stdDevEstimator= 1.0/sqrt(2*correctionFactor/pi)*(totalAbs/sampleSize);
//logStream << stdDevEstimator << COStream::endl;
totalAvg+=stdDevEstimator;
totalAvg2+=stdStdDev;
}
logStream<<"Avg 1k samples:"<<AsString(totalAvg/totalIterations,6)<<COStream::endl;
logStream<<"Avg 1k samples:"<<AsString(totalAvg2/totalIterations,6)<<COStream::endl;

11. Aug 1, 2012

steviekm3

For the particular application I'm working on, I'm looking for unbiased estimator of standard deviation. The reason is because I have sample points in which to infer the distribution. Once I have the distribution I have to run a simulation on it and the simulation generates random normal numbers. The function to generate the random normals takes standard deviation so I figure best to get estimator for standard deviation that I can feed into the generator. All this is more for interest sake as Number Nine points out that the regular formulas work great.

12. Aug 1, 2012

haruspex

Are you quite sure that's what you want? The square root of the unbiased estimator of the variance is not an unbiased estimator of the s.d.

13. Aug 2, 2012

steviekm3

I only need std deviation because the library function that I'm using takes standard deviation as an argument. I could adjust the regular estimator for standard deviation using:

"en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation" [Broken]

All of this is more for interest sake because my n is pretty large ( around 250 ). So I think with that big a sample size the bias becomes tiny.

Last edited by a moderator: May 6, 2017
14. Aug 2, 2012

steviekm3

My apologies here, the formula should have been:

(mean absolute deviation) /sqrt(2/pi)