Which error should I calculate?

1. Sep 29, 2011

hermano

Hi,

I want to compare (statistically) different models which predict the values y at several x-values. Therefore I want to calculate the 'total error' between the exact (measured) y-values and the calculated y-values using different models. My problem is that I'm not sure which method to use to calculate the 'total error' for each model. Should I use the sum of squared errors, the sum of the absolute errors or some other technique?

Thanks

2. Sep 29, 2011

Bacle

Bro:
It will depend on what aspect of the error you are interested in.

3. Sep 30, 2011

hermano

What do you mean whit 'the aspect'? I want to quantify the accuracy of my models by calculating the difference between the predicted values of the model and the measured values for the whole set of points (thus for each x-value). I can easily calculate the error for each x-value, but I want to add all these errors together on a way like the sum of absolute errors or something for the whole set to get a total error which is a number that quantities the total accuracy of my model. The question is: Which method should I use to add all these 'separate' errors together?

4. Sep 30, 2011

Bacle

Well, modeling of processes can be done either from the perspective/approach of
Least Squares, or from Maximum-Likelihood estimation. Ijust wondered what
perspective you are using to get some insight.

5. Sep 30, 2011

Stephen Tashi

The question is what YOU mean by aspect.

This expresses an intuitive desire but it is not a well posed mathematical problem. For example, Suppose you have a model F(x) for x values in the range 0 to 100, Is it more or less important to fit the values of x from 0 to 50 than the values from 90 to 100? Do you care about errors as measured by the arithmetic difference between measured and predicted values or do you care about the percentage error? Is an over prediction by 10 as bad as an un-prediction by 10? Is the data that you have equally spaced over all the x values, or do I have a lot of data for one particular subset of those values?

Most importantly, what are you trying to accomplish? Are you looking for a number that "quantifies the total accuracy of your model" to publish in a paper, or in an advertising flyer? Are you trying to do a statistical hypothesis tests that accepts or rejects the model?

6. Sep 30, 2011

Bacle

You put it much more nicely and precisely than I did, Stephen. Many people seem
not to realize the need for specific details of what they want when they make
a request. Nice job!.

7. Oct 1, 2011

hermano

Hi Stephen and Bacle,

Indeed, I want a number (which reflects the total error) that quantifies the total accuracy of my model so I can compare different models with each other.

I will try to explain my problem:
Lets say that the data I have measured is a rough sine wave in function of the angular position (0 to 2*pi, which is the independent variable x) which I measured with three sensors under three different angular positions. The sample frequency determines the number of data points, lets say that for one revolution this is 1000 equidistant points. I add all these three measurements together (three vectors of 1000 points) and this is my input for my model. With my model I want to separate the data again for each sensor. In order to quantify each model, I want to compute the difference between the measured data of each sensor and the separated data of my model for each sensor. This gives me again three vectors of 1000 points which is the ABSOLUTE error on each angular position for the three sensors. My question is: How can I define/calculate one number for each of these vectors that quantifies the total error of my model?

At the end I want to compare these numbers for each model in order to select the model which gives me the lowest error between the measured and calculated data based on the total error!

I hope it is more clear now to help me with my problem!

Thanks

8. Oct 1, 2011

Stephen Tashi

One of the first things to determine is if there is imprecision in the data. In a simplistic view of the world, the model would be $y = f(x)$ and the data would be perfectly accurate. In a slightly more complicated view, data of the form $(x_i, y_i)$ has $x_i$ measured perfectly but the $y_i$ have measurement errors. In an even more complicated view, the $x_i$ are not be perfectly accurate either.

For example, models are often fit by defining "best fit" to mean a fit f(x) that minimizes the average of the quantities $(y_i - f(x_i))^2$ which doesn't account for any error in the $x_i$. The different approach of "total least squares" assumes that there are also errors in the $x_i$ measurements. http://en.wikipedia.org/wiki/Total_least_squares

I'm guessing that there are complications in your problem that haven't been explained yet because you speak of "adding" the data from the 3 sensors and then separating it again. If by "adding", you simply mean putting the 3 sets of data into one file, then separating it again seems a trivial operation, so I don't know why you would bother to mention it. It would be best if you explained the actual nature of the sensors and what they measure. Do you process the raw sensor measurements by assuming the sensors are at some known angle relative to the thing they measure when $x_i = 0$. Is the placing of the 1000 equally spaced angles done by taking measurements equally spaced in time and assuming a constant rate of rotation of something?