Can the standard deviation calculation be generalized for other statistics?

Click For Summary

Discussion Overview

The discussion revolves around the possibility of generalizing the standard deviation calculation to a new concept termed "mean difference deviation," which is intended to measure variation in the mean differences of a data set. Participants explore the mathematical formulation of this concept and its implications for generating confidence intervals.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant proposes a method for calculating "mean difference deviation" by adapting the standard deviation formula to the mean differences of a data set.
  • Another participant suggests that the concept of "mean distribution" should be clarified mathematically, possibly using notation like E(X) for random variables.
  • A different participant questions the technical correctness of referring to a "confidence interval" in the context presented and raises concerns about the independence of differences.
  • One participant challenges the definition of "mean difference deviation," suggesting it might be more accurately termed "difference standard deviation."
  • There is a discussion about how to define the variance of the differences, including whether to count each difference once or twice, leading to different calculations of variance.
  • Participants provide examples to illustrate the calculations involved in determining the mean difference and its variance, highlighting potential discrepancies in definitions and methods.

Areas of Agreement / Disagreement

Participants express differing views on the definitions and calculations related to "mean difference deviation" and its relationship to standard deviation. There is no consensus on whether the proposed generalization is valid or how to properly define the associated statistical measures.

Contextual Notes

Some participants note the potential limitations in the assumptions about independence and normal distribution of the differences, as well as the accessibility of referenced articles for further reading.

klawson88
Messages
3
Reaction score
0
I've calculated the mean difference of my (normally distributed) data set. The mean difference is defined as:
The average absolute difference of any two independent values in a data set

Now, I'm trying to calculate the "mean difference deviation" in order to generate a confidence interval for this quantity ( "95% of the differences in the set are greater than ____"). My question is: can I generalize the standard deviation formula to calculate this? If we take the following concepts to be parallel:

Code:
Mean         <---------> Mean Difference
Single value <---------> Single Difference


... can I use the standard deviation equation to calculate the "mean difference deviation"? Namely turning this:
Standard deviation calculation steps

1. Take the difference of the mean and each single value
2. Square each result and add up the resulting numbers
3. Divide by the total number of values
4. Take the square root of #3

to this:

Proposed "mean difference deviation" calculation steps

1. Take the difference of the mean difference and each single difference
2. Square each result and add up the resulting numbers
3. Divide by the total number of differences
4. Take the square root of #3

I've looked up more direct ways to calculate this quantity, and all of them are contained in statistics articles that are http://www.jstor.org/stable/pdfplus/2333957.pdf?acceptTC=true(1) http://www.jstor.org/stable/pdfplus/2236592.pdf(2) http://www.jstor.org/stable/pdfplus/2282402.pdf(3); so much so that I can't even determine if its what I'm looking for, much less how to go about translating it into code (and I haven't even touched on efficiency concerns).

Can anyone provide some insight? And if it turns out this can't be done, would anyone mind taking a crack at translating the derived equations in those articles into English?
 
Physics news on Phys.org
Hey klawson88 and welcome to the forums.

It sounds like you are doing pretty much the same thing for the second case with the exception that the random variable is defined in a more complex way.

It seems like you are finding a measure of variation, but that you are referring to different things (one random variable involves a relationship of other random variables whereas the first is just a normal random variable).

It might help if you describe what you mean for the 'mean distribution' to be mathematically in terms of a formula of random variables and means (you can use E(X) to denote the mean of a particular random variable X).

We actually do this in statistical applications quite a bit. Although we deal with only mean and variance/standard deviation, we do in different contexts where it has a particular interpretation in one context versus another.

It will help you, if you read further statistics or learn/do further statistics to understand how to create a new random variable from other existing random variables using a formula to relate the two. This way you will be able to see mathematically that although you are just doing a "normal standard deviation calculation", when you are defining your new random variable in a certain way you are encoding "a specific kind of information" relevant to the actual formula.
 
Thanks chiro for the insight. I feel a lot more confident using the formula now. The formula for the mean difference (which is what I assume when you said "mean distribution") is:

PK4Ki.png
 
klawson,

I don't know whether you are doing this work for any serious purpose. In case you are, I think you better read those articles. (Unless a forum member is a JSTOR subscriber, that person cannot read the articles in your links. I can't - not that I'm promising to do so if they become available!)

klawson88 said:
Now, I'm trying to calculate the "mean difference deviation" in order to generate a confidence interval for this quantity ( "95% of the differences in the set are greater than ____").
You didn't say whether "the set" is the data or the population from which the data is drawn.

I don't think you are talking about a "confidence interval" in a technically correct way. The current Wikipedia article on confidence interval might straighten you out.

You are apparently planning to assume the differences are independent and normally distributed. It isn't clear that they are independent. For example |x1 - x2| and |x1 - x3| share the value x1. There may be some theory that says that they have a normal distribution and even that they are independent. If so, you should learn that theory - at least its results.
 
klawson88 said:
... can I use the standard deviation equation to calculate the "mean difference deviation"?

What do you mean by "mean difference deviation"? Do you mean "the standard deviation of the differences"? ( If so you could just say "difference standard deviation".)

Let's look an example. (Check my work.)

Suppose the there are 3 data values { 1, 2, 4}.

The formula you have is different than the one in the Wikipedia in a trivial way. The formula you give excludes the case i = j. Since |x_i - x_i| = 0 it seems unnecessary to do that.

The "GMD" is
\frac{ |1-2| + |1-4] + |2-4| + |2-1| + |2-4| + |4-1| + |4-2|}{(3)(3-1) } = \frac{12}{6} = 2

How do you define the variance of "the differences"? Are you going let each difference appear twice or just once? I don't think it matters for the usual definition of "sample variance".

If you count each difference once, you compute the variance of the data set {1,3,2}.
You get a mean of 2 and a variance of ( 1 + 1 +0) divided by 3, which is 2/3.

If you count each difference twice, you compute the variance of the data set {1,3,2,1,2,3}.
You get a mean of 2 and a variance of (1 + 1 + 0 + 1 + 0 + 1) divided by 6, which is also 2/3.

However, some people define the sample variance to be \frac{ \sum_{i=1}^n(x_i - \bar{x})^2 }{n-1} instead of \frac{ \sum_{i=1}^n(x_i - \bar{x})^2 }{n}. Those people would get 2/2 for the first answer and 4/5 for the second answer.

So our candidates for the sample standard deviation are \sqrt{\frac{2}{3}} ,\sqrt{\frac{2}{2}}, \sqrt{\frac{4}{5}}.

The method your propose:
1. Take the difference of the mean difference and each single difference
2. Square each result and add up the resulting numbers
3. Divide by the total number of differences
4. Take the square root of #3

Agrees with the answer \sqrt{\frac{2}{3}} doesn't it?
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K