caironotebook said:
Also, what is the purpose of squaring the result of x values minus the mean?
I look at it this way:
When you consider geometric shapes, scaling factors play a key role. For example, in "similar" triangles, the corresponding sides are in the same ratios, so, in theory, if you wanted to have reference picture for a family of similar triangles, you could draw a picture of one and scale it up or down to get pictures of all the others.
When you consider the most important probability distribution, the normal distribution, you are really considering a "family" of distributions, each with a different mean and/or variance. However, there is a way to look at all these distributions as having "similar" shape. In fact, if you pick a coordinate system so the mean of the distribution is at the origin and you set 1 "unit" on the axis equal to the standard deviation then all these distributions plot as the same curve. So if you need to compute something about "the" normal distribution, you only need one numerical table or computer subroutine that handles this single "standard" normal. The mean is like a translation parameter. The standard deviation is like a scaling factor.
For some non-normal distributions, the standard deviation is not the only scaling factor needed to "standardize" them, but the standard deviation is still useful in many other cases.
Your question is not about the population standard deviation. You are asking about the procedure for estimating the variance and standard deviation.
Most people accept the procedure for computing the mean of a sample as "natural" since it appears to be a version of the procedure for computing the mean of the population. Thinking this way can handicap you in understanding statistics. If you do think this way and believe what I said about the importance of the standard deviation (and hence of the variance) then it would be natural to think that taking the mean of the square deviations from the sample average is "natural" since it imitates the calculation you would do for the population standard deviation and variance. So maybe you can think this way as a crutch!
The truth is that there is no mathematical principle that says "If you want to estimate the value of something computed for the population then do a similar computation on your sample." This sometimes works well and sometimes doesn't. The scenario you must understand is this: When you try to estimate a parameter of the population (e.g. the mean, variance, median, mode etc.) you usually do a computation on the sample values. This computation is an "estimator" of the parameter. It is not necessarily equal to the population parameter. If we consider that a sample is random vector of values, then an "estimator" is also a random variable. Thus "the sample mean" can be considered a random variable. The custom of omitting the phrase "estimator of" from things like "sample mean", "sample variance" etc. results in some confusion. In some contexts "sample mean" will mean a specific number like 23.5. In other contexts "sample mean" will be considered a random variable and properties of its distribution will be discussed.
The case of "sample variance" is interesting. Some books define it as computed by dividing by n and some define it as computed by dividing by n-1. It's a rather arbitrary definition, if you aren't claiming that the result is an estimator. If you want to get as specific as using the terminology "the usual unbiased estimator of the sample variance", this refers to computing it by dividing by n-1.
An intuitive justification for the n-1 is that if you take a small sample where extrremes are improbable then you will tend to underestimate the variability of the population since your are unlikely to pick up the extremes in your sample. So you divide by n-1 instead of n to bump up the value. (Of course this doesn't explain why wouldn't use some other formula to bump it up. You'll have to study the real calculation to understand that.)
I agree with Alan2's point that statistics is not axiomatic. (I can't comment on his deleted post; i didn't see it.) I don't interpret this to mean that the the mathematics of statistics isn't axiomatic I interpret it to mean that applying statistics to real world problems is a subjective matter. Further, the terminology used in statistics "confidence intervals" , "statistical significance" can be misleading to laymen. They usually interpret such terms to be the answers they want to know about a situation rather than what such calculations actually mean.