Metric for rating funtion accuracy from R^m to R.

Click For Summary
SUMMARY

The discussion focuses on methods for evaluating the accuracy of functions that map from R^m to R, specifically in the context of generating pseudo-random expressions to approximate a given dataset. The primary metric discussed is the sum of squared errors (SSE), represented as ∑_{j} (y_j - z_j)^2, where y_j are actual measured data points and z_j are predicted values. An alternative metric mentioned is the sum of absolute errors, which is preferred in scenarios with high noise levels in the data. The choice of metric is crucial for accurately reflecting the performance of the generated functions.

PREREQUISITES
  • Understanding of R^m to R mappings
  • Familiarity with statistical metrics such as sum of squared errors (SSE)
  • Knowledge of error measurement techniques in data analysis
  • Basic programming skills for implementing evaluation metrics
NEXT STEPS
  • Research statistical methods for evaluating function accuracy in regression analysis
  • Explore the implementation of sum of absolute errors in programming languages like Python or R
  • Investigate techniques for handling noisy datasets in predictive modeling
  • Learn about optimization algorithms for minimizing error metrics in machine learning
USEFUL FOR

This discussion is beneficial for data scientists, statisticians, and machine learning practitioners who are involved in function approximation and error analysis in predictive modeling.

TylerH
Messages
729
Reaction score
0
I'm writing program in which I generate pseudo random expressions with the hope of finding one that closely approximates the given data set. The functions map from R^m (an m-tuple of reals) to a real. What I need is a way to rate the functions by their accuracy. Are there any known methods for doing this? Maybe something from stats.

Ideally this would be a method that would take a list of expected outputs and actual outputs and return a real number. A uniform distribution would be good, but not required.
 
Physics news on Phys.org
A standard thing is to do something like
\sum_{j} \left( y_j - z_j \right)^2

where the yj are the actual data that you measured, and the zj are the predictions that you made (this is called the sum of squared errors). It's used mostly because it's easy to work with sums of squares when doing analytical calculations (like trying to minimize it). If you're doing a numerical method, people will often instead use the sum of the absolute values of the errors, especially if they consider the property "makes sure there are no significant outliers at the cost of being off by slightly more on average" to be a negative quality
 
I went with the abs value one because, by nature of the fact they're all random, there are going to be a lot of bad solutions. So I thought a metric which doesn't underestimate their badness would be better. Thanks.
 
TylerH said:
I went with the abs value one because, by nature of the fact they're all random, there are going to be a lot of bad solutions. So I thought a metric which doesn't underestimate their badness would be better. Thanks.

I don't really understand what this post is trying to get at... do you mean that your data has a lot of noise in it?
 

Similar threads

  • · Replies 0 ·
Replies
0
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 24 ·
Replies
24
Views
7K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 19 ·
Replies
19
Views
881
  • · Replies 24 ·
Replies
24
Views
5K
  • · Replies 39 ·
2
Replies
39
Views
5K