Normalized SSE for a set of data?

  • Thread starter pergradus
  • Start date
  • Tags
    Data Set
In summary, the conversation discusses the concept of Normalized Sum of Squared Errors (SSE) for a set of data and how it can be used to measure the goodness of fit for a model. The conversation also touches on the importance of considering factors such as uncertainty, number of data points, and magnitude of dependent variables in comparing different data sets. The method of using the mean of the sum of the squares of percentage errors is suggested, but it is also noted that a precise definition of what makes a good or bad comparison is necessary for a mathematical answer. The term "mean squared error" (MSE) is also mentioned as a widely used and well-regarded statistic for evaluating bias of estimators. However, assessing the number of
  • #1
pergradus
138
1
Normalized SSE for a set of data??

Hi, supposed I have a set of data points, and each data point has a certain value of uncertainty associated with it.

Supposed also I have a function which models the data. What I'd like to know is how does one quantitatively measure how good the fit of the model is to the data, in such as way that one can compare the model to different sets of data?

For example, taking the SSE defined as:

[itex]\sum(y_i - f(x_i , \beta))^2[/itex]

where [itex] \beta[/itex] is a set of parameters, one can measure the difference between the model and the data. However, this does not take into account the number of data points or the degree of uncertainty. If I have a very large number of data points, a small difference between the data and model will result in a very large SSE, even if the model is a very good fit. On the contrary, even if the model is poor, a small set of data points may produce a small SSE - thus there is no way to compare between data sets for the same model.

Also, one must consider the magnitude of the data points when comparing sets. For example, if my [itex] y_i [/itex] values range from 100...500 for one set, a small fractional change between the model and the data may still produce a huge SSE, while a huge fractional change between the model and data ranging from 0.01...0.5 will result in a small SSE.

So, what I'd like is a way to compare the goodness of a fit for a wide variety of data sets that takes into account the error in the data, the number of data points, and the magnitude of the dependent variables in the data - can someone explain how to do this and what such a quantity is called?
 
Physics news on Phys.org
  • #2


pergradus said:
So, what I'd like is a way to compare the goodness of a fit for a wide variety of data sets that takes into account the error in the data, the number of data points, and the magnitude of the dependent variables in the data - can someone explain how to do this and what such a quantity is called?

You could try the mean of the sum of the squares of the percentage errors, but unless you can define what makes a good or bad comparison precisely, there is no mathematical answer to your question.
 
  • #3


Stephen Tashi said:
You could try the mean of the sum of the squares of the percentage errors, but unless you can define what makes a good or bad comparison precisely, there is no mathematical answer to your question.

I was thinking along the same lines... something like this:

[itex]\frac{1}{N}\sum\frac{(y_i - f(x_i, \beta)) + e_i}{y_i}[/itex]

where N is the total number of data points, and ei is the error associated with the ith data point. This would give the average maximum possible difference between the curve and the data.

Do you see any flaws in this reasoning?
 
  • #4


pergradus said:
Do you see any flaws in this reasoning?

It isn't reasoning. It's the method of "I don't know exactly what I want, but I'll recognize it when I see it". Everybody who deals with practical problems resorts to this method sometimes. If you want to deduce an answer by mathematical reasoning, you have to define precisely what it is that you are trying to do. If you are just casually fooling around with some data, I think the formula is worth a try. If you have some very serious purpose in your work, you should try define you goal in precise language and see where reasoning will take you.
 
  • #5
I haven't seen any reference to the term "mean squared error" (MSE) in this thread, This is simply the the SSE divided by the number of degrees of freedom and is essentially what this thread is addressing. It's a widely used and well regarded statistic and is useful in evaluating the bias of estimators among other things. I'm not aware of any particular issues associated with it. Am I missing something in this discussion?

EDIT: Assessing the number of degrees of freedom can be difficult due to correlations among variables. MANOVA analysis is usually required in the multivariate setting. In the univariate setting there is no problem provided the individual data points (observations) are independent.

http://userwww.sfsu.edu/~efc/classes/biol710/manova/manovanewest.htm
 
Last edited:

FAQ: Normalized SSE for a set of data?

What is normalized SSE for a set of data?

Normalized SSE (Sum of Squared Errors) is a measure of the variation or dispersion of data points from the mean. It is used to evaluate the accuracy of a regression model by measuring the difference between the predicted values and the actual values of the data.

How is normalized SSE calculated?

Normalized SSE is calculated by taking the sum of the squared differences between the predicted values and the actual values, and then dividing it by the total number of data points in the set minus one. This helps to account for the degrees of freedom in the data.

What is a good value for normalized SSE?

A good value for normalized SSE is typically close to zero, as this indicates that the predicted values are very close to the actual values. However, the interpretation of what is considered a "good" value can vary depending on the context and the specific data being analyzed.

How does normalized SSE compare to other measures of accuracy?

Normalized SSE is commonly used in regression analysis, but it is not the only measure of accuracy. Other measures such as R-squared, mean absolute error, and mean squared error are also used to evaluate the performance of a model. Each measure has its own strengths and limitations, and it is important to consider multiple measures when analyzing data.

Can normalized SSE be negative?

No, normalized SSE cannot be negative. This is because the squared differences between predicted and actual values are always positive, and when divided by the degrees of freedom, the result will also be positive.

Similar threads

Back
Top