Which error should I calculate?

  • Thread starter Thread starter hermano
  • Start date Start date
  • Tags Tags
    Error
Click For Summary
To compare the accuracy of different predictive models, the user seeks a method to calculate a 'total error' that quantifies the difference between measured and predicted values across multiple sensors. The discussion highlights the need to clarify what aspect of error is most important, such as whether to prioritize absolute differences or percentage errors. Various methods for calculating total error, such as the sum of absolute errors or least squares, are considered, with an emphasis on ensuring that the approach aligns with the user's goals for model comparison. The complexity of the data, including potential imprecision and the nature of the sensors, is acknowledged as a factor that may influence the choice of error calculation method. Ultimately, the goal is to derive a single numerical value for each model that facilitates comparison based on total error.
hermano
Messages
38
Reaction score
0
Hi,

I want to compare (statistically) different models which predict the values y at several x-values. Therefore I want to calculate the 'total error' between the exact (measured) y-values and the calculated y-values using different models. My problem is that I'm not sure which method to use to calculate the 'total error' for each model. Should I use the sum of squared errors, the sum of the absolute errors or some other technique?

Thanks
 
Physics news on Phys.org
Bro:
It will depend on what aspect of the error you are interested in.
 
What do you mean whit 'the aspect'? I want to quantify the accuracy of my models by calculating the difference between the predicted values of the model and the measured values for the whole set of points (thus for each x-value). I can easily calculate the error for each x-value, but I want to add all these errors together on a way like the sum of absolute errors or something for the whole set to get a total error which is a number that quantities the total accuracy of my model. The question is: Which method should I use to add all these 'separate' errors together?
 
Well, modeling of processes can be done either from the perspective/approach of
Least Squares, or from Maximum-Likelihood estimation. Ijust wondered what
perspective you are using to get some insight.
 
hermano said:
What do you mean whit 'the aspect'?
The question is what YOU mean by aspect.

I want to quantify the accuracy of my models by calculating the difference between the predicted values of the model and the measured values for the whole set of points (thus for each x-value). I can easily calculate the error for each x-value, but I want to add all these errors together on a way like the sum of absolute errors or something for the whole set to get a total error which is a number that quantities the total accuracy of my model. The question is: Which method should I use to add all these 'separate' errors together?

This expresses an intuitive desire but it is not a well posed mathematical problem. For example, Suppose you have a model F(x) for x values in the range 0 to 100, Is it more or less important to fit the values of x from 0 to 50 than the values from 90 to 100? Do you care about errors as measured by the arithmetic difference between measured and predicted values or do you care about the percentage error? Is an over prediction by 10 as bad as an un-prediction by 10? Is the data that you have equally spaced over all the x values, or do I have a lot of data for one particular subset of those values?

Most importantly, what are you trying to accomplish? Are you looking for a number that "quantifies the total accuracy of your model" to publish in a paper, or in an advertising flyer? Are you trying to do a statistical hypothesis tests that accepts or rejects the model?
 
You put it much more nicely and precisely than I did, Stephen. Many people seem
not to realize the need for specific details of what they want when they make
a request. Nice job!.
 
Hi Stephen and Bacle,

Indeed, I want a number (which reflects the total error) that quantifies the total accuracy of my model so I can compare different models with each other.

I will try to explain my problem:
Lets say that the data I have measured is a rough sine wave in function of the angular position (0 to 2*pi, which is the independent variable x) which I measured with three sensors under three different angular positions. The sample frequency determines the number of data points, let's say that for one revolution this is 1000 equidistant points. I add all these three measurements together (three vectors of 1000 points) and this is my input for my model. With my model I want to separate the data again for each sensor. In order to quantify each model, I want to compute the difference between the measured data of each sensor and the separated data of my model for each sensor. This gives me again three vectors of 1000 points which is the ABSOLUTE error on each angular position for the three sensors. My question is: How can I define/calculate one number for each of these vectors that quantifies the total error of my model?

At the end I want to compare these numbers for each model in order to select the model which gives me the lowest error between the measured and calculated data based on the total error!

I hope it is more clear now to help me with my problem!

Thanks
 
One of the first things to determine is if there is imprecision in the data. In a simplistic view of the world, the model would be y = f(x) and the data would be perfectly accurate. In a slightly more complicated view, data of the form (x_i, y_i) has x_i measured perfectly but the y_i have measurement errors. In an even more complicated view, the x_i are not be perfectly accurate either.

For example, models are often fit by defining "best fit" to mean a fit f(x) that minimizes the average of the quantities (y_i - f(x_i))^2 which doesn't account for any error in the x_i. The different approach of "total least squares" assumes that there are also errors in the x_i measurements. http://en.wikipedia.org/wiki/Total_least_squares

I'm guessing that there are complications in your problem that haven't been explained yet because you speak of "adding" the data from the 3 sensors and then separating it again. If by "adding", you simply mean putting the 3 sets of data into one file, then separating it again seems a trivial operation, so I don't know why you would bother to mention it. It would be best if you explained the actual nature of the sensors and what they measure. Do you process the raw sensor measurements by assuming the sensors are at some known angle relative to the thing they measure when x_i = 0. Is the placing of the 1000 equally spaced angles done by taking measurements equally spaced in time and assuming a constant rate of rotation of something?
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
Replies
8
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 5 ·
Replies
5
Views
6K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 3 ·
Replies
3
Views
1K