Outlier Detection - Algorithm to Exclude Systematic Error from Data Set

  • Context: Undergrad 
  • Thread starter Thread starter vibe3
  • Start date Start date
  • Tags Tags
    Detection
Click For Summary

Discussion Overview

The discussion revolves around identifying and excluding systematic errors from a time series data set, specifically related to magnetic field measurements. Participants explore algorithms for outlier detection and the challenges associated with defining the goal of the analysis.

Discussion Character

  • Exploratory, Technical explanation, Debate/contested

Main Points Raised

  • One participant seeks recommendations for algorithms to detect systematic errors in their data, particularly around specific time points.
  • Another participant suggests that moving averages can be utilized over different window sizes to help identify outlier regions.
  • A distinction is made between wanting an algorithm for personal use without academic scrutiny versus one that would withstand academic review.
  • A further suggestion is made to analyze the differences between adjacent data points to identify peaks that correspond to outliers, recommending the use of moving averages to reduce noise.

Areas of Agreement / Disagreement

Participants generally agree on the need for an algorithm to detect outliers, but there are differing views on the level of rigor required for the algorithm, with some preferring a more exploratory approach and others seeking a method that meets academic standards.

Contextual Notes

Participants have not fully defined the specific characteristics of the systematic error or the criteria for what constitutes an outlier in this context. There is also an acknowledgment of potential noise in the data that may affect the analysis.

vibe3
Messages
39
Reaction score
1
Hi all, I have data similar to the following

plot.png


where the x-axis is time and the y-axis is magnetic field. At around t = 20 (and t = -80) there is a systematic error (probably due to some other current switching on and then switching off) which I want to get rid of in my data.

Can anyone recommend a good algorithm to detect when this happens in my time series and exclude it from my data set?

I plotted the moving average too which seems to indicate it is not as simple as simply searching for large deviations from the mean.
 
Physics news on Phys.org
vibe3 said:
I plotted the moving average.

Moving averages can be taken over windows of various sizes and the windows can include both the past and future. You could try various windows.

Your goal isn't precisely defined yet. It could be either one of the following:

1) I want an algorithm to detect the regions of the curve affected by switching currents. Suggest an algorithm. I'll try it and decide myself if it works. There doesn't have to be any statistical justification for it. This is not for a published paper or anything that needs academic scrutiny.

2) I want an algorithm that can stand academic scrutiny and not attract criticism if I write up what I'm doing as a report.
 
Option 1 would be fine for me
 
Judging from the curve, you have very large differences between adjacent bins at the edges of those outliers. If you just plot ##|n_i-n_{i-1}|##, they should give two nice peaks. Use the moving average of a few bins instead of the original values if the dataset is too noisy.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 21 ·
Replies
21
Views
3K
  • · Replies 25 ·
Replies
25
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 6 ·
Replies
6
Views
5K
Replies
28
Views
4K
  • · Replies 37 ·
2
Replies
37
Views
5K
  • · Replies 24 ·
Replies
24
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 22 ·
Replies
22
Views
4K