# Transformed Standard Deviation?

• Gridvvk
In summary, it is not possible to find the new standard deviation with just the mean and standard deviation of the original data set and the new mean. Additional information, such as the values of the removed elements, is needed to accurately determine the new standard deviation. Without this information, it is impossible to uniquely determine the new standard deviation.
Gridvvk
I realize you can transform the data in many ways, but I really can't find a method to solve this particular scenario anywhere online:

Suppose you are given the mean and standard deviation of a set of data. Further suppose you take away (get rid of) "x" elements in the original data set, and now you're given the new mean. Is it possible to find the new standard deviation? It seems like it's something very simple, but I can't seem to come up with a solution. If it is indeed possible, how would I go about doing this?

Numbers aren't important, but perhaps a numerical example to illustrate what I'm asking:

A set of data has a mean of 20 and a standard deviation of 2. Suppose you take four elements from the set of data, and the new mean is now 50. What is the new standard deviation?

Why don't you just calculate the standard deviation of the new dataset?
Do you mean that you don't have the dataset; only the new mean? I would strongly suspect that the question is ill-posed, then. I could likely remove points in multiple ways to create the same mean but different SD's.

Yes sorry for not clarifying that earlier. I only have the mean and s.d. of the set, I don't know the size of the set or individual points. The question is trivial if I have the data set. Does that mean there is not enough information to infer the new standard deviation of the new data set?

I'm not sure how would you go about removing points in multiple ways to get the same mean but different standard deviations, given that the original data has a particular mean and standard deviation already. For instance, in the example I gave there is a big jump in the mean after removing four points => 20 to 50, but the deviation of the original set is 2, is it really possible to alter the mean so drastically in multiple ways? What is there preventing the existence of a unique way?

But I'm just interested in the "new standard deviation", if possible to compute.

Edit: Upon further reflection, I don't think there is a systematic way in general and my example was flawed.

It implied:
20(n + 4) - s = 50n
Where n + 4 = total number of elements initially, and s = sum of elements removed:

80 = 30n + s
Only positive integer solution for n is n = 1 (s = 50) and n = 2 (s = 20).

If n+4 = 5 and the mean is 20, and one element is 50, then I don't think it's possible to have a s.d. of 2. I'm assuming it's likewise for the n = 2, so example was flawed.

Last edited:
Gridvvk said:
Yes sorry for not clarifying that earlier. I only have the mean and s.d. of the set, I don't know the size of the set or individual points. The question is trivial if I have the data set. Does that mean there is not enough information to infer the new standard deviation of the new data set?

I don't see how to do it. It would be a freakish situation, and seems impossible to me, so maybe there is some freakish way to go about it.

Do you know the values of of the points that were removed?

skeptic2 said:
Do you know the values of of the points that were removed?

Nope, only the number of points removed. If you have the values then it is indeed possible to solve.

Assuming the "example" is plausible (i.e. it is possible to have a n s.t. the given mean and standard deviation are attainable), it seems that you're only able to determine the new standard deviation if only 1 value is removed.

=================(my workings, if anyone is interested)================
(I apologize for being incompetent in LaTex, so the work might be difficult to read).

r = number of values removed, where r < n.

initial mean = (1 / n) sum [i = 1 to n] x_i
new mean = 1 / (n - r) * sum[i = 1 to n -r] x_i

You're given both quantities, as well as r, so it is possible to solve for sum[i = 1 to n -r] x_i.
Let new mean = u
Let old mean = u_0

new s.d. = sqrt [(sum[i = 1 to n - r] (x_i + u)^2 / (n - r))]

Expand the exponent and simplifying gives:

new s.d. = sqrt[(sum[i = 1 to n - r](x_i^2) - (n - r)u^2) / (n-r))]

So all you need is: (sum[i = 1 to n - r](x_i^2)

sum [i = 1 to n] x_i = sum[1 = 1 to n -r] x_i + v
Where: v = x_n-r + x_n-(r-1) + x_n-(r-2)+...+ x_n

You can solve for v, since the other two quantities are known.

Initial s.d. = sqrt[(sum[i = 1 to n](x_i^2) - (n)u_0^2) / (n))]

You're given n, u_0, and initial s.d. so you can solve for: sum[i = 1 to n](x_i^2)

(sum[i = 1 to n - r](x_i^2) + (x_n-r)^2 + (x_n-(r+1))^2 + ...+ (x_n)^2 = sum[i = 1 to n](x_i^2)

Since, v = x_n-r + x_n-(r-1) + x_n-(r-2)+...+ x_n (and you know the value for v)

the problem now just involves figuring out:

(x_n-r)^2 + (x_n-(r-1))^2 + (x_n-(r-2))^2 +...+ (x_n)^2

If you can find that then you can find the new standard deviation, or perhaps there might be another trick which makes the problem easier.

This means that if one value is removed (and you're not given the value) then it'd just be v^2 and that you can solve for the new standard deviation. If two values are removed and you know:

v = x_n-1 + x_n, and you have a numeric value for v.

Then it comes done to solving:

x_n-1^2 + x_n^2

which means you somehow need to account for a 2(x_n-1 * x_n) when you square v.

and it gets harder to solve as r gets bigger.

Last edited:
To get new sd uniquely you necessarily require the removed group sd along with sizes and means of two groups out of the three groups (old, removed, new).

## 1. What is Transformed Standard Deviation?

Transformed Standard Deviation is a statistical measure that describes the spread of a set of data from its mean. It is used to determine how much the values in a dataset vary or deviate from the average value. The transformed standard deviation is a modified version of the traditional standard deviation that is used when the data is not normally distributed.

## 2. How is Transformed Standard Deviation calculated?

The transformed standard deviation is calculated by first transforming the data into a normal distribution using a mathematical transformation, such as the Box-Cox transformation. Then, the traditional standard deviation formula is applied to the transformed data to calculate the transformed standard deviation.

## 3. Why is Transformed Standard Deviation used?

Transformed Standard Deviation is used when the data is not normally distributed. This could be due to a variety of reasons, such as outliers or a skewed distribution. By transforming the data, the transformed standard deviation can give a more accurate measure of the spread of the data compared to the traditional standard deviation.

## 4. What are the advantages of using Transformed Standard Deviation?

The main advantage of using Transformed Standard Deviation is that it can provide a more accurate measure of spread for non-normally distributed data. It also allows for comparisons between datasets that have different distributions, as the transformed standard deviation ensures that the data is normally distributed before calculating the spread.

## 5. Are there any limitations to using Transformed Standard Deviation?

One limitation of using Transformed Standard Deviation is that it may not always result in a normal distribution, even after applying the transformation. In these cases, alternative measures of spread, such as the interquartile range, may be more appropriate. Additionally, the choice of transformation method can also impact the results of the transformed standard deviation, so it is important to carefully select an appropriate transformation for the data.

Replies
4
Views
2K
Replies
5
Views
2K
Replies
4
Views
1K
Replies
18
Views
2K
Replies
24
Views
3K
Replies
4
Views
3K
Replies
2
Views
1K
Replies
4
Views
1K
Replies
14
Views
930
Replies
15
Views
2K