aaaa202 said:
and while on the topic. Why use the square of each points deviation from the mean to calculate the standard deviation? Why not just the numerical value. It just seems that standard deviation couldve been defined in a lot of other ways that would make just as much sense.
Certainly there are many ways to measure the deviation of a random variable from its mean. The standard deviation is ##\sqrt{E[(x-\mu)^2]}##. Your suggestion is ##E[|x-\mu|]##. These are both special cases of a more general form: for any real number ##p \geq 1##, we may define what mathematicians call the ##p##-norm:
$$\|x-\mu\|_p = \left(E[|x - \mu|^p]\right)^{1/p}$$
The standard deviation is the ##p=2## case, and your suggestion is the ##p=1## case. All of these are measures of the deviation of ##x## from its mean. Higher values of ##p## assign greater weight (or penalty) when ##|x - \mu| > 1##. The reverse is true when ##|x - \mu| < 1##.
The ##p=2## case has some nice mathematical properties.
First, we can easily expand the variance into three terms:
$$E[|x-\mu|^2] = E[(x-\mu)^2] = E[x^2] - 2\mu E[x] + \mu^2$$
(assuming ##x## is real-valued). This makes it very easy to work analytically with the ##p=2## case.
Second, it plays nicely with the notion of covariance. If we define the covariance of two random variables ##x## and ##y## to be ##\text{cov}(x,y) = E[(x-\mu_x)(y - \mu_y)]##, we get a measure of how correlated the two variables are. Then, we immediately have ##\text{var}(x) = \text{cov}(x,x)##. We also have the Cauchy-Schwarz inequality: ##|\text{cov}(x,y)| \leq \sigma_x \sigma_y##, where ##\sigma_x## and ##\sigma_y## are the standard deviations of ##x## and ##y##.
Another nice feature is that if ##x## and ##y## are independent, then you can simply add the variances: ##\text{var}(x+y) = \text{var}(x) + \text{var}(y)##. This very convenient property is not true for ##p \neq 2##.