The error between two different standard deviations

In summary, the conversation discusses two different methods for computing the standard deviation between two sets of data (f and y). One method involves finding the standard deviation directly, while the other method involves finding the standard deviation of the differences between the two sets. The speaker is curious about the differences between these two measures and if there is a way to understand their similarity for any given set.
  • #1
TheCanadian
367
13
Hi,

I have a 2 sets (f and y) of 1000 data points each. Also, each data point corresponds to one in the other set. Essentially, I wanted to compute the standard deviation between the two sets, and I did this:

## \sigma_1 = ## IMAGE 1 (check attachment)

## \sigma_2 = ## IMAGE 2 (check attachment)

##\Delta H## is simply ##f(x_i) - y(x_i)##. This gives me a new set of the differences, and ##\bar{H} ## is the average of this new set. As you can see, one computation is the standard deviation between the two sets, while the other computation is the standard deviation of the differences between the two sets.

Now, I am simply wondering if it's possible to know what the error between ## \sigma_1## and ## \sigma_2## is without having to check manually. My apologies if this sounds like an odd inquiry or a very vague request, but these two variables seem like two very different quantities, and I am just a little uncertain on how different the two really are. I've done the two computations for a set of data (with a nearly Gaussian distribution) of my own using this method and my answers are nearly identical (off by 0.003) but I'm fairly sure this is dependent on the data itself.

Any advice is welcome!
 

Attachments

  • Screen Shot 2015-08-18 at 1.03.47 PM.png
    Screen Shot 2015-08-18 at 1.03.47 PM.png
    1.1 KB · Views: 608
  • Screen Shot 2015-08-18 at 1.04.00 PM.png
    Screen Shot 2015-08-18 at 1.04.00 PM.png
    981 bytes · Views: 650
Physics news on Phys.org
  • #2
What do you mean with "error between ## \sigma_1## and ## \sigma_2##"?
 
  • #3
TheCanadian said:
Essentially, I wanted to compute the standard deviation between the two sets

What do you mean by that?

TheCanadian said:
##\Delta H## is simply ##f(x_i) - y(x_i)##. This gives me a new set of the differences, and ##\bar{H} ## is the average of this new set. As you can see, one computation is the standard deviation between the two sets, while the other computation is the standard deviation of the differences between the two sets.

I don't understand what you're looking for.

If you're looking to compare both sets, you usually shouldn't work with the difference of both data sets per value. What you want is maybe the difference between a function (for example the mean) of one and the same function of the other, and then look for the standard error of the mean difference, for example. You also need to tell us what you are talking about when you say you want to measure the "error between both standard deviations".
 
Last edited:
  • #4
h6ss said:
What do you mean by that?
I don't understand what you're looking for.

If you're looking to compare both sets, you usually shouldn't work with the difference of both data sets per value. What you want is maybe the difference between a function (for example the mean) of one and the same function of the other, and then look for the standard error of the mean difference, for example. You also need to tell us what you are talking about when you say you want to measure the "error between both standard deviations".

Sorry for being very vague earlier. I was in a bit of a rush but hopefully I can try and expand now. It really is a bit of an odd inquiry, though. I'm fairly sure it's almost certainly dependent on the entries in my sets, but I am wondering if there's anything I am missing here.

I have two sets of data (f and y), each consisting of 1000 entries. So I find the standard deviation like this:

## \sigma_1 = \sqrt {\frac {\sum^{N=1000}_{i=1} (f_i - y_i)^2}{N-1}} ##

This seems fairly standard and is simply the standard deviation between the two sets. But I also computed a second quantity by first computing:

##\Delta H_i = f_i - y_i ## which gave me a new set ## H ## which is just a 1000 entries that consist of the difference of f and y (i.e. ##\Delta H_i## from i = 1 to 1000). Then I found the average of this new set: ## \bar{H} ## which is just equal to ## \frac {\sum^{N=1000}_{i=1} \Delta H_i }{N} ##

## \sigma_2 = \sqrt {\frac {\sum^{N=1000}_{i=1} (\Delta H_i - \bar{H})^2}{N-1}} ##

I guess I'm just wondering if there's a way to understand how different these two measures are for any given set. I am likely just projecting my own biased interpretation, but these two quantities seem very intimate since one is just a measure of the spread of the two values (f and y), while the other is a measure of the spread of the difference of the two values (f and y). I guess I'm just wondering if there's a way to understand how different these two measures are for any given sets. I feel like I'm still being vague, but I'm just curious to see if ## \sigma_1 ## and ## \sigma_2 ## actually are similar for any given set, possibly dependent on N.
 
  • #5
TheCanadian said:
Sorry for being very vague earlier. I was in a bit of a rush but hopefully I can try and expand now. It really is a bit of an odd inquiry, though. I'm fairly sure it's almost certainly dependent on the entries in my sets, but I am wondering if there's anything I am missing here.

I have two sets of data (f and y), each consisting of 1000 entries. So I find the standard deviation like this:

## \sigma_1 = \sqrt {\frac {\sum^{N=1000}_{i=1} (f_i - y_i)^2}{N-1}} ##

This seems fairly standard and is simply the standard deviation between the two sets. But I also computed a second quantity by first computing:

##\Delta H_i = f_i - y_i ## which gave me a new set ## H ## which is just a 1000 entries that consist of the difference of f and y (i.e. ##\Delta H_i## from i = 1 to 1000). Then I found the average of this new set: ## \bar{H} ## which is just equal to ## \frac {\sum^{N=1000}_{i=1} \Delta H_i }{N} ##

## \sigma_2 = \sqrt {\frac {\sum^{N=1000}_{i=1} (\Delta H_i - \bar{H})^2}{N-1}} ##

I guess I'm just wondering if there's a way to understand how different these two measures are for any given set. I am likely just projecting my own biased interpretation, but these two quantities seem very intimate since one is just a measure of the spread of the two values (f and y), while the other is a measure of the spread of the difference of the two values (f and y). I guess I'm just wondering if there's a way to understand how different these two measures are for any given sets. I feel like I'm still being vague, but I'm just curious to see if ## \sigma_1 ## and ## \sigma_2 ## actually are similar for any given set, possibly dependent on N.

sigma_2 is the standard deviation of the differences of the paired values. sigma_1 seems to me to be nothing meaningful. The formula you have used to calculate it is not the formula for standard deviation.
 
  • Like
Likes FactChecker and TheCanadian
  • #6
Hornbein said:
sigma_2 is the standard deviation of the differences of the paired values. sigma_1 seems to me to be nothing meaningful. The formula you have used to calculate it is not the formula for standard deviation.

Ahh yes! I should definitely expand and please correct me if something seems wrong. Also, any advice is once again welcome.

I essentially have a scatter plot of values, with ## y(x_i) ## being the data points appropriately placed on the x-y axis. Now, I fitted a function to these data points, y, to create a model for the distribution of y. I wanted to find how much my model deviates from the true value (y), though. In such a case, wouldn't ## \sigma_1 = \sqrt {\frac {\sum^{N=1000}_{i=1} (f(x_i) - y(x_i))^2}{N-1}} ## be an appropriate measure of deviation between my model and actual data? If not, any particular explanation would be very helpful. If so, then I guess I'm just trying to figure out if ## \sigma_1 ## or ## \sigma_2 ## is the better measure, but if both are still perfectly valid and comparable in most cases.
 
  • #7
TheCanadian said:
I have two sets of data (f and y), each consisting of 1000 entries. So I find the standard deviation like this:

## \sigma_1 = \sqrt {\frac {\sum^{N=1000}_{i=1} (f_i - y_i)^2}{N}} ##

This seems fairly standard and is simply the standard deviation between the two sets.

The standard deviation between the two sets makes no sense. No such thing exists. Where did you get this formula from? As I said earlier, you can't just use the difference of the values for both datasets and label it as the standard deviation "between" them. Each dataset has a standard deviation within itself, dealt with individually.

We have the f-set's and the y-set's standard deviations calculated with

##\sigma_f = \sqrt{\frac{1}{1000}\Sigma_{i = 1}^{1000} (f_i - \mu_f)^2}## and ##\sigma_y = \sqrt{\frac{1}{1000}\Sigma_{i = 1}^{1000} (y_i - \mu_y)^2}##,

where ##\mu_f## and ##\mu_y## are the respective means for both sets.

This is why I don't understand the ##(f_i-y_i)## part in your formula. However, the standard deviation that you calculate for ##\Delta H## sounds right, but since ##\Delta H=f_i-y_i##, then in the second standard deviation you're just basically calculating the standard deviation using the term ##\Delta H-\bar{H}=f_i-y_i-\bar{H}## which I don't really see the use.

If your goal is to compare both datasets and see if there's a significative difference between them, maybe you should measure the spread of each dataset individually by finding their respective standard deviations and then test for the difference of their standard deviations. Otherwise I don't see the motivation behind comparing the two "formulas" you've stated.
 
  • Like
Likes TheCanadian
  • #8
TheCanadian said:
Ahh yes! I should definitely expand and please correct me if something seems wrong. Also, any advice is once again welcome.

I essentially have a scatter plot of values, with ## y(x_i) ## being the data points appropriately placed on the x-y axis. Now, I fitted a function to these data points, y, to create a model for the distribution of y. I wanted to find how much my model deviates from the true value (y), though. In such a case, wouldn't ## \sigma_1 = \sqrt {\frac {\sum^{N=1000}_{i=1} (f(x_i) - y(x_i))^2}{N-1}} ## be an appropriate measure of deviation between my model and actual data? If not, any particular explanation would be very helpful. If so, then I guess I'm just trying to figure out if ## \sigma_1 ## or ## \sigma_2 ## is the better measure, but if both are still perfectly valid and comparable in most cases.

Aha. Yes, sigma_1 is the better measure. It's not a standard deviation. I don't know what to call it anymore. Sum of squares of the differences, I guess.

I wouldn't call y the true value, I'd call it the measured value. The true value is unknown due to measurement error.

sigma_2 doesn't seem all that useful to me. It will always be less than or equal to sigma_1. It seems to me that there is no reason to subtract H bar. It is a fairly meaningless random variable, I would think, other than telling you whether your function tends to give you a value that is higher or lower than the measured value.
 
  • Like
Likes TheCanadian
  • #9
The key phrase you are looking for is "goodness of fit." sigma_1 is a statistic used to measure goodness of fit.
 
  • Like
Likes TheCanadian
  • #10
h6ss said:
The standard deviation between the two sets makes no sense. No such thing exists. Where did you get this formula from? As I said earlier, you can't just use the difference of the values for both datasets and label it as the standard deviation "between" them. Each dataset has a standard deviation within itself, dealt with individually.

We have the f-set's and the y-set's standard deviations calculated with

##\sigma_f = \sqrt{\frac{1}{1000}\Sigma_{i = 1}^{1000} (f_i - \mu_f)^2}## and ##\sigma_y = \sqrt{\frac{1}{1000}\Sigma_{i = 1}^{1000} (y_i - \mu_y)^2}##,

where ##\mu_f## and ##\mu_y## are the respective means for both sets.

This is why I don't understand the ##(f_i-y_i)## part in your formula. However, the standard deviation that you calculate for ##\Delta H## sounds right, but since ##\Delta H=f_i-y_i##, then in the second standard deviation you're just basically calculating the standard deviation using the term ##\Delta H-\bar{H}=f_i-y_i-\bar{H}## which I don't really see the use.

If your goal is to compare both datasets and see if there's a significative difference between them, maybe you should measure the spread of each dataset individually by finding their respective standard deviations and then test for the difference of their standard deviations. Otherwise I don't see the motivation behind comparing the two "formulas" you've stated.

Yep. Those are all errors on my part. That would be an interesting idea to compare the two standard deviations, but is there no better measure? Is my quantity denoted by $ \sigma_1 $ not at least descriptive of the difference between these two sets?

I guess my main goal has been to simply calculate the deviation between my measured (y) and estimated (f) values, and I erroneously considered the square root of the sum of the squares to be a standard deviation for some odd reason. I also assumed ## \sigma_2## to be a good measure (if it's mean, ## \bar{H}##, equals 0). I guess it is "a" measure, but I'm a little unsure on what exactly it could be appropriately named (as Hornbein stated).
 
  • #11
Hornbein said:
The key phrase you are looking for is "goodness of fit." sigma_1 is a statistic used to measure goodness of fit.

Thank you! Is it unreasonable to consider this goodness of fit value as the error in my expected and measured value?

To reiterate: would it be correct to say that ## \sigma_1 ## is a measure of the error in approximating y as f? While ## \sigma_2 ## is the standard deviation of the difference between f and y?
 
  • #12
TheCanadian said:
Thank you! Is it unreasonable to consider this goodness of fit value as the error in my expected and measured value?

To reiterate: would it be correct to say that ## \sigma_1 ## is a measure of the error in approximating y as f? While ## \sigma_2 ## is the standard deviation of the difference between f and y?

Sure. Your statistic is well known as the "least squares" metric. It is standard. Just say "I'm using least squares." The smaller the sum of the squares of the differences, the better the fit. (You needn't bother taking the square root, though it seems harmless to me.)

You could also call it "nonlinear regression." That doesn't mean a whole lot, but that is what it is called if f(x) is not a linear function.

sigma_2 is the standard deviation, but to me it doesn't seem all that meaningful or useful.
 
  • Like
Likes TheCanadian
  • #13
The standard deviation of the difference between two data sets makes sense only if they are somehow related - such as ordered in time. For example if f and y represent output of a sensor at a simultaneous point in time then the standard deviation of their difference is of interest. If there is no common ordering of the sets then the measure is meaningless
 
  • Like
Likes TheCanadian
  • #14
TheCanadian said:
would it be correct to say that ## \sigma_1 ## is a measure of the error in approximating y as f?

Not really, but maybe you'll find more information about what you're looking for here: https://en.wikipedia.org/wiki/Residual_sum_of_squares

TheCanadian said:
While ## \sigma_2 ## is the standard deviation of the difference between f and y?

That is correct, but again, be careful with how you interpret this information.
 
  • #15
BWV said:
The standard deviation of the difference between two data sets makes sense only if they are somehow related - such as ordered in time. For example if f and y represent output of a sensor at a simultaneous point in time then the standard deviation of their difference is of interest. If there is no common ordering of the sets then the measure is meaningless


Yes, they are ordered and represent a simultaneous reading.
 
  • #16
The difference between sigma_1 and sigma_2 in the op is

Sig_1 is the Stdev of the difference in readings at each point in time

Sig_2 is the same thing but 'whitened' by extracting the mean. if the mean error is zero then the two measures are identical. This is a common transformation when this data is used for additional or algorithms
 
  • #17
Since variance = mean of square minus square of mean = mean of squared difference from mean, the difference between sigma1 squared and sigma2 squared is the square of the mean difference between the two sets of observations. The first one (sigma1 squared) being the larger of the two.
 

What is the error between two different standard deviations?

The error between two different standard deviations is a measure of the difference or discrepancy between the two standard deviations. It is used to evaluate the accuracy and precision of a data set or experiment.

How is the error between two different standard deviations calculated?

The error between two different standard deviations is calculated by subtracting the smaller standard deviation from the larger one. The result is then divided by the larger standard deviation and multiplied by 100 to get a percentage.

What does a large error between two different standard deviations indicate?

A large error between two different standard deviations indicates that there is a significant difference between the two sets of data or experiments. This could be due to factors such as measurement error, sample size, or variability within the data.

Can the error between two different standard deviations be negative?

No, the error between two different standard deviations cannot be negative. It is always a positive value, as it represents the difference between two positive numbers (standard deviations).

How can the error between two different standard deviations be reduced?

The error between two different standard deviations can be reduced by increasing the sample size, improving measurement techniques, and reducing variability in the data. It is also important to carefully consider the factors that may affect the standard deviations and address them accordingly.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
781
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
971
  • Set Theory, Logic, Probability, Statistics
2
Replies
37
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
700
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Calculus and Beyond Homework Help
Replies
3
Views
845
  • Chemistry
Replies
1
Views
868
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
Back
Top