Question about Mean Squared ErrorWhy Squared?

  • Thread starter Saladsamurai
  • Start date
  • Tags
    Mean
In summary, the mean squared error or MSE is a measure of the average of the square of the "error." It is one of many ways to quantify the amount by which an estimator differs from the true value of the quantity being estimated. The reason for using MSE is that you will have a quadratic curve, analogous to y=x^2, which has a unique minimal value, which is its derivative equal to zero.
  • #1
Saladsamurai
3,020
7
Hello there :smile:

I have no background in statistics, but have encountered some at my job and I am seeking to better understand the nature of Data Analysis.

From Wikipedia:

In statistics, the mean squared error or MSE of an estimator is one of many ways to quantify the amount by which an estimator differs from the true value of the quantity being estimated. As a loss function, MSE is called squared error loss. MSE measures the average of the square of the "error."

But as obvious as it may be to some, I cannot for the life of me figure out why we average the squares of the error?

And why is this a better measure of accuracy than simply measuring the errors themselves?

Thanks!
 
Physics news on Phys.org
  • #2
The math for working with the squared error is simpler than the math for working with the absolute error. You can do things like take the derivative of the squared error or express it with matrices.
 
  • #3
I really don't know if this is correct at all, but I've always thought that one reason at least is that by squaring the errors you're working with the magnitudes of the errors without regard to the signs of the raw errors.

I'm very weak in statistics, however, so I could well be wrong about that ...
 
  • #4
You can show that for a joint Gaussian distribution that the minimization of mean squared error is equivalent to maximum likelyhood. I suppose then you could ask, "Why maximum likelyhood".
 
  • #5
Bear with me - I may be very rusty here.

Suppose we define the kth order root mean power deviation of a data set [itex]\underline{X}=\{ x_1,x_2, \dots x_n\}[/itex] as

[tex]dev_k(\underline{X}) = \sqrt[k]{\frac{\sum_{i=1}^{n}(x_i-\overline{x})^k}{n}}[/tex]

where we interpret the 1st root to be the radicand itself.

(Note "dev" is not an official name for anything, I made it up for this example).

Then devk is a measure of dispersion for any k > 0.

But

[tex]dev_1(\underline{X})=\frac{\sum_{i=1}^{n}(x_i-\overline{x})}{n}[/tex]

[tex]=\frac{\sum_{i=1}^{n}(x_i)-\sum_{i=1}^{n}(\overline{x})}{n}[/tex]

[tex]=\frac{n\overline{x} - n\overline{x}}{n}=0[/tex]

So that measure is rather useless. Some condsider [itex]\sum |x_i - \overline{x}|/n[/itex] instead, and this leads to some useful information.

However the second order measure (typically called the root mean square or some such) is analogous to the "moment of inertia" of the distribution about the mean and has been mentioned is useful in analysing minimization of error.

It was decided (I am not sure when, early 20th century?) that this measure would be the "standard" one (hence "standard deviation"), but the other higher order ones are also valid measures.

This may not fully answer your question, but statisticians have put some thought into which measure is best.

--Elucidus
 
Last edited:
  • #6
Elucidus said:
Bear with me - I may be very rusty here.

Good presentation of central moments, but the divisor is usually [tex]n-1[/tex]. This is more important with small sample sizes.
 
  • #7
The squared error was historically used because that is the natural method to use when you assume the errors behind your data are normally distributed. The fact that using the squares made the following mathematics easier to work with was a bonus.
 
  • #8
The reason of using MSE is that you will have a quadratic curve, similar to [tex]y=x^2[/tex], which has a unique minimal value, which is its derivative equal to zero.
 
  • #9
statdad is right.

A normal distribution has only two parameters, its mean and its standard deviation. The average of your data is an unbiased estimator of the mean, and the mean square error is an unbiased estimator of the standard deviation.

this is why these two statistics are used. they are unbiased estimators for the defining parameters of the unknown normal distribution.In reality, a lot of data is close to normal. For instance stock price returns are nearly normal. In these cases one can estimate the distributions directly with the sample mean and standard deviation. If data is not normal then one can take averages before estimating parameters. Before computers, this is what statisticians did because they needed to know something about the mathematical form of their sample distribution. if you average your data to create a new sampling distribution then this new distribution of averages is approximately normal. this reduces the problem of data analysis to estimating which normal distribution your sample data comes form.
 
  • #10
SW VandeCarr said:
Good presentation of central moments, but the divisor is usually [tex]n-1[/tex]. This is more important with small sample sizes.

Interestingly enough the use of the divisor (n - 1) is a rather recent development. I've read texts as recent as the 40's where the standard deviation has a devisor of n. As such the expected value of s2 is [itex]n^2\cdot \sigma^2 / (n-1)^2[/itex] if I'm not mistaken. Since is was an underestimator of the true variance, statisticians and probabilists switched to the current divisor of (n - 1).

Whether the divisor is n or (n - 1), these expressions are all measures of dispersion; whether they're good measures is another matter.

--Elucidus
 
  • #11
Elucidus said:
Interestingly enough the use of the divisor (n - 1) is a rather recent development. I've read texts as recent as the 40's where the standard deviation has a devisor of n. As such the expected value of s2 is [itex]n^2\cdot \sigma^2 / (n-1)^2[/itex] if I'm not mistaken. Since is was an underestimator of the true variance, statisticians and probabilists switched to the current divisor of (n - 1).

Whether the divisor is n or (n - 1), these expressions are all measures of dispersion; whether they're good measures is another matter.

--Elucidus

For large n it doesn't really matter.
 

1. What is the Mean Squared Error (MSE) and why is it squared?

The Mean Squared Error is a measure of how close a set of predicted values are to the actual values. It is calculated by taking the average of the squared differences between the predicted and actual values. The reason for squaring the differences is to give more weight to larger errors, which can better represent the overall accuracy of the predictions.

2. What is the difference between Mean Squared Error and Mean Absolute Error?

While both MSE and MAE are measures of error in predicting values, the main difference is that MSE squares the differences between the predicted and actual values, while MAE takes the absolute value of the differences. This means that MSE will penalize larger errors more heavily, making it more sensitive to outliers.

3. Why is Mean Squared Error commonly used in machine learning and data analysis?

MSE is commonly used in these fields because it is a differentiable and convex function, which makes it easy to work with mathematically. It also gives a clear and interpretable measure of error, which is useful for evaluating and comparing different models.

4. Can Mean Squared Error be negative?

No, Mean Squared Error cannot be negative. Since it involves squaring the differences between predicted and actual values, the result will always be a positive value. A negative value would indicate that the predictions are more accurate than the actual values, which is not possible.

5. Are there any limitations to using Mean Squared Error?

One limitation of using MSE is that it can be heavily influenced by outliers in the data. This means that the overall error metric may not accurately reflect the performance of the model. Additionally, MSE gives equal weight to all errors, which may not always be desirable in certain scenarios.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
280
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
849
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
979
  • Set Theory, Logic, Probability, Statistics
Replies
25
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
Back
Top