Standard deviation of aggregated data

In summary, the conversation discusses how to calculate the standard deviation for aggregated intervals of data, which have different lengths of time. The solution involves combining the second moments of the data and using an unbiased estimator for the standard deviation. This can be done using a specific formula found on the Wikipedia website.
  • #1
KThy
2
0
This might be embarrasingly easy or impossible; I've been a computer programmer for too long since my statistics classes to tell for sure :blushing:

I have a set of records with the following data for each record: interval, mean speed, standard deviation of speed, number of observations. Exempli gratia:

Code:
1, 77.2, 1.75, 10
2, 75.9, 2.05, 12

Now, if I want to aggregate the data to get mean speed and standard deviation over two or more intervals (1 and 2 above), calculating the weighted mean is no problem but how - if possible - do I calculate the standard deviation for the aggregated intervals?
 
Physics news on Phys.org
  • #2
Given the std. dev. and mean, you can easily get the second moment. Combine the second moments by the same procedure you used for the means. Finally calculate the std. dev. from the weighted mean and weighted second moment.
 
  • #3
mathman said:
Combine the second moments by the same procedure you used for the means. Finally calculate the std. dev. from the weighted mean and weighted second moment.

Right, of course. As I said, embarrasingly simple - thanks!
 
  • #4
I recently encountered a similar problem at work.

However, for me the problem is that some of the intervals used for measuring average speed are different lengths of time. I believe the problem is the same?

Since we consider the data to be a sample, we want the unbiased estimator for the standard deviation which takes on the following form given on wikipedia website (i derived it alone to make sure).


http://en.wikipedia.org/wiki/Mean_square_weighted_deviation

Cheers.
 
  • #5


It is definitely possible to calculate the standard deviation for aggregated data. This can be done by using the formula for calculating the standard deviation from a sample, which takes into account the number of observations and the individual standard deviations for each interval.

To calculate the standard deviation for the aggregated intervals, you would first need to calculate the weighted mean of the mean speeds for each interval. This can be done by multiplying the mean speed for each interval by the number of observations, adding them together, and then dividing by the total number of observations for all intervals.

Next, you would need to calculate the weighted mean of the standard deviations for each interval using the same method as above. Then, you can use the formula for calculating the standard deviation from a sample, which is the square root of the sum of the squared differences between each individual mean speed and the overall weighted mean, divided by the total number of observations minus one.

In summary, to calculate the standard deviation for aggregated data, you would need to find the weighted mean of the mean speeds and standard deviations for each interval, and then use the formula for calculating the standard deviation from a sample. I hope this helps and please don't feel embarrassed, statistics can be tricky and it's always good to refresh our knowledge.
 

What is standard deviation?

Standard deviation is a measure of how spread out a set of data is from its mean (average) value. It tells us how much the data points vary from the average. A higher standard deviation indicates that the data points are more spread out, while a lower standard deviation indicates that the data points are closer to the mean.

Why is standard deviation important?

Standard deviation is important because it allows us to understand the variability in a set of data. It helps us to identify outliers or extreme values in the data, and can also be used to compare the spread of data between different groups or populations. In addition, standard deviation is used in various statistical calculations and can provide valuable insights into the data.

How is standard deviation calculated?

The formula for calculating standard deviation is the square root of the sum of the squared differences between each data point and the mean, divided by the number of data points. This can be written as:
σ = √(Σ(x - μ)^2 / n)
where σ is the standard deviation, x is each data point, μ is the mean, and n is the number of data points.

What is the difference between population standard deviation and sample standard deviation?

Population standard deviation is used when the data represents the entire population, while sample standard deviation is used when the data is a subset or sample of the population. The formula for calculating sample standard deviation is slightly different and uses n - 1 as the denominator instead of n. This is because sample data can vary slightly from the entire population, and using n - 1 helps to account for this difference.

How can standard deviation be affected by outliers?

Outliers, or extreme values in the data, can greatly affect the standard deviation. If there are a few very large or very small values in the data, they can increase the standard deviation and make it appear that the data has more variability than it actually does. It is important to identify and consider outliers when interpreting the standard deviation of a dataset.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
15
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
676
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
979
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
716
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
  • Calculus and Beyond Homework Help
Replies
2
Views
1K
  • Classical Physics
Replies
7
Views
497
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
Back
Top