Mathematical impact of outliers on accuracy of models

In summary, the conversation discusses the impact of outliers on the accuracy of predictive models. It is mentioned that deleting outliers is a common method, but it may not always be valid. The topic of robust statistical analysis methods is also brought up, with the importance of not solely relying on eliminating outliers. Finally, the conversation concludes that there is no general means to determine the probability of a high-leverage point being an influential point, but there are numerical measures that can provide insight.
  • #1
Galteeth
69
1
Is there a general approach to calculating the impact outliers have on the accuracy of one's (predictive) model?
 
Last edited:
Physics news on Phys.org
  • #2
Galteeth said:
Is there a general approach to calculating the impact outliers have on the accuracy of one's (predictive) model?

I think your question is too general. You need to describe your model.

If your model is based on regression against approximately normally distributed data, the influence of outliers is well understood.

I have seen data analysts rerun the model with and without the outlier points.
 
  • #3
As wofsy said, the question is far too general. There is no one technique for analyzing what outliers will do / have done. Editing outliers is one commonly used technique. Sensors do occasionally go out to lunch. Transmission errors can create huge outliers. Just about the only thing one can do with a 1020 sigma outlier is to delete it. This suggests a refinement on the approach wofsy described in his post. Use some heuristic to delete gross outliers, run the model, delete statistical outliers that the gross heuristics didn't catch, and re-run the model.

This doesn't always work because it assumes that the heuristics and model are basically correct. Example: The ozone hole over Antarctica was initially discovered by ground observations rather than by satellite observations because of the overaggressive use of this technique on the satellite data.
 
  • #5
The question and answers are dancing around the topic of robust statistical analysis methods. Deleting outliers is one way to deal with them, but unless you know that they are due to errors in measurement (sensors going haywire) eliminating them simply because they are outliers is not a valid statistical procedure.
It is also important to note these things:
outliers are rather easy to find in low dimensional problems, but extremely difficult in high dimensional problems.
in regression, points of high leverage may not appear as outliers in the traditional sense of large residuals - in severe situations the regression line may pass through them, so the residual is zero.

the point of a robust analysis is to use a process that yields results that can be interpreted in ways similar to the traditional least-squares (normal distribution assumption based) methods but which are not as easily influenced by departures from the hypothesized model as the traditional methods might be.

perhaps a too long comment, but discussing "tossing out data" in general can lead to dangerous things.
 
  • #6
The partial leverage article was useful. Thanks for all the responses. What i was trying to get at was, is there a general means to determine the probability of a high-leverage point being an influential point?
 
  • #7
Galteeth said:
The partial leverage article was useful. Thanks for all the responses. What i was trying to get at was, is there a general means to determine the probability of a high-leverage point being an influential point?

No, not to find the probability. If you are working with regression, doing a search on regression diagnostics will provide advice on some numerical measures of the severity of a leverage point on your fit.

Note that all points in statistics are influential - but not all are equally influential, in good or bad ways, so looking for "influential points" may not lead to much that is useful
 
  • #8
Ok, thanks, that answered my question.
 

What is the definition of an outlier in mathematical modeling?

An outlier is a data point that significantly differs from the majority of the data points in a dataset. It can be either a very high or very low value, and is often considered to be an extreme or unusual observation.

How do outliers affect the accuracy of mathematical models?

Outliers can have a significant impact on the accuracy of mathematical models. They can skew the results and lead to incorrect conclusions. Outliers can also affect the overall performance of the model, making it less reliable and less accurate.

How can outliers be identified in a dataset?

There are several methods for identifying outliers in a dataset, including visual inspection of a box plot or scatter plot, using statistical techniques such as the Z-score or interquartile range, and utilizing specialized algorithms such as the DBSCAN clustering method.

What are the potential consequences of not addressing outliers in a mathematical model?

If outliers are not addressed in a mathematical model, the model's accuracy and reliability can be compromised. This can lead to incorrect predictions and decisions, which can have real-world consequences in fields such as finance, healthcare, and engineering.

How can outliers be dealt with in mathematical modeling?

There are several approaches for dealing with outliers in mathematical modeling, including removing the outliers from the dataset, transforming the data to make it more normally distributed, or using robust statistical methods that are less affected by outliers. The best approach will depend on the specific dataset and the goals of the modeling project.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
Replies
14
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • High Energy, Nuclear, Particle Physics
Replies
20
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
457
  • Set Theory, Logic, Probability, Statistics
2
Replies
64
Views
3K
  • Introductory Physics Homework Help
Replies
17
Views
937
  • Introductory Physics Homework Help
Replies
7
Views
1K
Replies
5
Views
1K
Back
Top