Compensating for outliers during Standard Deviation Calculation ?

Click For Summary
SUMMARY

The discussion focuses on calculating the standard deviation (SD) of a dataset using a 20-period sliding window while addressing the challenge of gaps in readings. The user seeks methods to filter out significant fluctuations, specifically between the values of 103 and 95, which disrupt the calculation. It is concluded that the presence of two distinct populations in the data renders the standard deviation and mean calculations meaningless. Instead of ignoring the gaps, it is recommended to investigate the underlying causes of the data shift after the 10th sample.

PREREQUISITES
  • Understanding of standard deviation calculations
  • Familiarity with sliding window techniques in data analysis
  • Knowledge of data population characteristics
  • Experience with statistical significance in datasets
NEXT STEPS
  • Research methods for handling outliers in statistical analysis
  • Learn about sliding window algorithms in Python or R
  • Explore techniques for identifying and analyzing data populations
  • Investigate the implications of data gaps on statistical metrics
USEFUL FOR

Data analysts, statisticians, and researchers who are involved in time series analysis and require insights on handling fluctuations in datasets effectively.

Aston08
Messages
21
Reaction score
0
I am trying to calculate the standard deviation of a group of data based on a 20 period sliding window. I have run into a bit of a problem in knowing how to deal with gaps up or down in the readings and was wondering what the correct method for compensating for this was.Below is an example of the situation I am trying to compensate for:

102
103
101
105
103
102
103
101
105
103
95
92
94
93
92
95
92
94
93
92Obviously there is a big gap from 103 to 95, but in this particular situation that is not of significance to me and I would like to filter it out if possible as these spikes tend affect the readings that follow.
 
Physics news on Phys.org
Outliers are indiviual measurements that are significantly different from the rest of the collection. You don't have any outliers here, what you have appears to be two separate populations.

You could measure the SD of this collection of data, but it would be a fairly meaningless number - indeed the mean itself (98) is meaningless.

Your data are trying to tell you that something happened after the 10th sample; I'd suggest you investigate that rather than try and ignore it.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 7 ·
Replies
7
Views
4K
  • · Replies 7 ·
Replies
7
Views
2K
Replies
4
Views
3K