Replacement of Squaring in Variance Equation: Benefits?

AI Thread Summary
The variance equation sums the squared distances between data values and the mean to ensure positive values, which is crucial for statistical analysis. Using absolute values instead of squares would not provide the same mathematical properties, such as the relationship to the Normal Distribution and the additive nature of variances. The Mean Absolute Deviation, while studied, does not adhere to the same useful laws as variance. Discussions around these concepts require clarity on their definitions and applications in probability distributions and sample statistics. The preference for squared distances in variance calculations stems from their mathematical advantages in statistical theory.
vanmaiden
Messages
101
Reaction score
1
The variance equation basically sums up all the distances between each data value and the mean of the set. The interesting thing is that each distance and squared for a reason that I believe is to make the distance positive, but why don't the statisticians just take the absolute value of each distance to give a smaller number? Is there some benefit to having a large number to work with? I mean, the smaller numbers nevertheless have decimals that can be used to compare magnitude and such.

Thank you.
 
Physics news on Phys.org


vanmaiden said:
why don't the statisticians just take the absolute value of each distance to give a smaller number?

That's a good question and I don't think it has a simple answer. There are many reasons why the mean squared deviation is very useful. It is directly related to a parameter in the often used Normal Distribution, while the "Mean Absolute Deviation" (which is what you are proposing as an alternative) is not. If X and Y are independent random variables then variances obey the law Var(X+Y) = Var(X) + Var(Y), but I don't think the Mean Absolute Deviation obeys such a nice law.

The Mean Absolute Deviation has been studied and used, so you can't really say that statisiticans haven't tried it.

If you want to talk about things like the mean squared deviation or the mean absolute deviation, you need to be clear which of the 3 different meanings you are discussing. Each of these things can be 1) A parameter of a probability distribution, 2) a statistic computed from a sample or 3) a formula for estimating a parameter in a probability distribution by using values from a sample. Each of those 3 things can be discussed as 1) a random variable or 2) a specific value of random variable.
 
I was reading a Bachelor thesis on Peano Arithmetic (PA). PA has the following axioms (not including the induction schema): $$\begin{align} & (A1) ~~~~ \forall x \neg (x + 1 = 0) \nonumber \\ & (A2) ~~~~ \forall xy (x + 1 =y + 1 \to x = y) \nonumber \\ & (A3) ~~~~ \forall x (x + 0 = x) \nonumber \\ & (A4) ~~~~ \forall xy (x + (y +1) = (x + y ) + 1) \nonumber \\ & (A5) ~~~~ \forall x (x \cdot 0 = 0) \nonumber \\ & (A6) ~~~~ \forall xy (x \cdot (y + 1) = (x \cdot y) + x) \nonumber...
Back
Top