Can the standard deviation calculation be generalized for other statistics?

klawson88
Messages
3
Reaction score
0
I've calculated the mean difference of my (normally distributed) data set. The mean difference is defined as:
The average absolute difference of any two independent values in a data set

Now, I'm trying to calculate the "mean difference deviation" in order to generate a confidence interval for this quantity ( "95% of the differences in the set are greater than ____"). My question is: can I generalize the standard deviation formula to calculate this? If we take the following concepts to be parallel:

Code:
Mean         <---------> Mean Difference
Single value <---------> Single Difference


... can I use the standard deviation equation to calculate the "mean difference deviation"? Namely turning this:
Standard deviation calculation steps

1. Take the difference of the mean and each single value
2. Square each result and add up the resulting numbers
3. Divide by the total number of values
4. Take the square root of #3

to this:

Proposed "mean difference deviation" calculation steps

1. Take the difference of the mean difference and each single difference
2. Square each result and add up the resulting numbers
3. Divide by the total number of differences
4. Take the square root of #3

I've looked up more direct ways to calculate this quantity, and all of them are contained in statistics articles that are http://www.jstor.org/stable/pdfplus/2333957.pdf?acceptTC=true(1) http://www.jstor.org/stable/pdfplus/2236592.pdf(2) http://www.jstor.org/stable/pdfplus/2282402.pdf(3); so much so that I can't even determine if its what I'm looking for, much less how to go about translating it into code (and I haven't even touched on efficiency concerns).

Can anyone provide some insight? And if it turns out this can't be done, would anyone mind taking a crack at translating the derived equations in those articles into English?
 
Physics news on Phys.org
Hey klawson88 and welcome to the forums.

It sounds like you are doing pretty much the same thing for the second case with the exception that the random variable is defined in a more complex way.

It seems like you are finding a measure of variation, but that you are referring to different things (one random variable involves a relationship of other random variables whereas the first is just a normal random variable).

It might help if you describe what you mean for the 'mean distribution' to be mathematically in terms of a formula of random variables and means (you can use E(X) to denote the mean of a particular random variable X).

We actually do this in statistical applications quite a bit. Although we deal with only mean and variance/standard deviation, we do in different contexts where it has a particular interpretation in one context versus another.

It will help you, if you read further statistics or learn/do further statistics to understand how to create a new random variable from other existing random variables using a formula to relate the two. This way you will be able to see mathematically that although you are just doing a "normal standard deviation calculation", when you are defining your new random variable in a certain way you are encoding "a specific kind of information" relevant to the actual formula.
 
Thanks chiro for the insight. I feel a lot more confident using the formula now. The formula for the mean difference (which is what I assume when you said "mean distribution") is:

PK4Ki.png
 
klawson,

I don't know whether you are doing this work for any serious purpose. In case you are, I think you better read those articles. (Unless a forum member is a JSTOR subscriber, that person cannot read the articles in your links. I can't - not that I'm promising to do so if they become available!)

klawson88 said:
Now, I'm trying to calculate the "mean difference deviation" in order to generate a confidence interval for this quantity ( "95% of the differences in the set are greater than ____").
You didn't say whether "the set" is the data or the population from which the data is drawn.

I don't think you are talking about a "confidence interval" in a technically correct way. The current Wikipedia article on confidence interval might straighten you out.

You are apparently planning to assume the differences are independent and normally distributed. It isn't clear that they are independent. For example |x1 - x2| and |x1 - x3| share the value x1. There may be some theory that says that they have a normal distribution and even that they are independent. If so, you should learn that theory - at least its results.
 
klawson88 said:
... can I use the standard deviation equation to calculate the "mean difference deviation"?

What do you mean by "mean difference deviation"? Do you mean "the standard deviation of the differences"? ( If so you could just say "difference standard deviation".)

Let's look an example. (Check my work.)

Suppose the there are 3 data values { 1, 2, 4}.

The formula you have is different than the one in the Wikipedia in a trivial way. The formula you give excludes the case i = j. Since |x_i - x_i| = 0 it seems unnecessary to do that.

The "GMD" is
\frac{ |1-2| + |1-4] + |2-4| + |2-1| + |2-4| + |4-1| + |4-2|}{(3)(3-1) } = \frac{12}{6} = 2

How do you define the variance of "the differences"? Are you going let each difference appear twice or just once? I don't think it matters for the usual definition of "sample variance".

If you count each difference once, you compute the variance of the data set {1,3,2}.
You get a mean of 2 and a variance of ( 1 + 1 +0) divided by 3, which is 2/3.

If you count each difference twice, you compute the variance of the data set {1,3,2,1,2,3}.
You get a mean of 2 and a variance of (1 + 1 + 0 + 1 + 0 + 1) divided by 6, which is also 2/3.

However, some people define the sample variance to be \frac{ \sum_{i=1}^n(x_i - \bar{x})^2 }{n-1} instead of \frac{ \sum_{i=1}^n(x_i - \bar{x})^2 }{n}. Those people would get 2/2 for the first answer and 4/5 for the second answer.

So our candidates for the sample standard deviation are \sqrt{\frac{2}{3}} ,\sqrt{\frac{2}{2}}, \sqrt{\frac{4}{5}}.

The method your propose:
1. Take the difference of the mean difference and each single difference
2. Square each result and add up the resulting numbers
3. Divide by the total number of differences
4. Take the square root of #3

Agrees with the answer \sqrt{\frac{2}{3}} doesn't it?
 
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Thread 'Detail of Diagonalization Lemma'
The following is more or less taken from page 6 of C. Smorynski's "Self-Reference and Modal Logic". (Springer, 1985) (I couldn't get raised brackets to indicate codification (Gödel numbering), so I use a box. The overline is assigning a name. The detail I would like clarification on is in the second step in the last line, where we have an m-overlined, and we substitute the expression for m. Are we saying that the name of a coded term is the same as the coded term? Thanks in advance.
Back
Top