Transformed Standard Deviation?

Click For Summary

Discussion Overview

The discussion centers around the possibility of calculating the new standard deviation of a dataset after removing a specified number of elements, given the original mean and standard deviation, as well as the new mean. Participants explore the implications of having limited information about the dataset and the challenges associated with inferring the new standard deviation without knowing the individual data points.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant questions whether it is possible to determine the new standard deviation without the dataset, suggesting that the question may be ill-posed.
  • Another participant expresses uncertainty about how to remove points in multiple ways to achieve the same mean but different standard deviations, highlighting the drastic change in mean in their example.
  • A participant reflects on the flawed nature of their example, indicating that the drastic change in mean may not allow for a consistent standard deviation.
  • Some participants argue that knowing only the number of points removed is insufficient to determine the new standard deviation, while having the values of the removed points would allow for a solution.
  • One participant provides a detailed mathematical exploration of the problem, suggesting that if only one value is removed, it may be possible to compute the new standard deviation, but the complexity increases with more values removed.
  • Another participant asserts that to uniquely determine the new standard deviation, additional information about the removed group's standard deviation and the means of the three groups (old, removed, new) is necessary.

Areas of Agreement / Disagreement

Participants generally agree that there is insufficient information to uniquely determine the new standard deviation without additional data about the removed elements. Multiple competing views remain regarding the feasibility of calculating the new standard deviation under the given constraints.

Contextual Notes

Limitations include the absence of individual data points and the size of the original dataset, which complicates the ability to infer the new standard deviation. The discussion highlights the dependence on specific assumptions about the dataset and the nature of the removed elements.

Gridvvk
Messages
54
Reaction score
1
I realize you can transform the data in many ways, but I really can't find a method to solve this particular scenario anywhere online:

Suppose you are given the mean and standard deviation of a set of data. Further suppose you take away (get rid of) "x" elements in the original data set, and now you're given the new mean. Is it possible to find the new standard deviation? It seems like it's something very simple, but I can't seem to come up with a solution. If it is indeed possible, how would I go about doing this?

Numbers aren't important, but perhaps a numerical example to illustrate what I'm asking:

A set of data has a mean of 20 and a standard deviation of 2. Suppose you take four elements from the set of data, and the new mean is now 50. What is the new standard deviation?
 
Physics news on Phys.org
Why don't you just calculate the standard deviation of the new dataset?
Do you mean that you don't have the dataset; only the new mean? I would strongly suspect that the question is ill-posed, then. I could likely remove points in multiple ways to create the same mean but different SD's.
 
Yes sorry for not clarifying that earlier. I only have the mean and s.d. of the set, I don't know the size of the set or individual points. The question is trivial if I have the data set. Does that mean there is not enough information to infer the new standard deviation of the new data set?

I'm not sure how would you go about removing points in multiple ways to get the same mean but different standard deviations, given that the original data has a particular mean and standard deviation already. For instance, in the example I gave there is a big jump in the mean after removing four points => 20 to 50, but the deviation of the original set is 2, is it really possible to alter the mean so drastically in multiple ways? What is there preventing the existence of a unique way?

But I'm just interested in the "new standard deviation", if possible to compute.

Edit: Upon further reflection, I don't think there is a systematic way in general and my example was flawed.

It implied:
20(n + 4) - s = 50n
Where n + 4 = total number of elements initially, and s = sum of elements removed:

80 = 30n + s
Only positive integer solution for n is n = 1 (s = 50) and n = 2 (s = 20).

If n+4 = 5 and the mean is 20, and one element is 50, then I don't think it's possible to have a s.d. of 2. I'm assuming it's likewise for the n = 2, so example was flawed.
 
Last edited:
Gridvvk said:
Yes sorry for not clarifying that earlier. I only have the mean and s.d. of the set, I don't know the size of the set or individual points. The question is trivial if I have the data set. Does that mean there is not enough information to infer the new standard deviation of the new data set?

I don't see how to do it. It would be a freakish situation, and seems impossible to me, so maybe there is some freakish way to go about it.
 
Do you know the values of of the points that were removed?
 
skeptic2 said:
Do you know the values of of the points that were removed?

Nope, only the number of points removed. If you have the values then it is indeed possible to solve.

Assuming the "example" is plausible (i.e. it is possible to have a n s.t. the given mean and standard deviation are attainable), it seems that you're only able to determine the new standard deviation if only 1 value is removed.

=================(my workings, if anyone is interested)================
(I apologize for being incompetent in LaTex, so the work might be difficult to read).

r = number of values removed, where r < n.

initial mean = (1 / n) sum [i = 1 to n] x_i
new mean = 1 / (n - r) * sum[i = 1 to n -r] x_i

You're given both quantities, as well as r, so it is possible to solve for sum[i = 1 to n -r] x_i.
Let new mean = u
Let old mean = u_0

new s.d. = sqrt [(sum[i = 1 to n - r] (x_i + u)^2 / (n - r))]

Expand the exponent and simplifying gives:

new s.d. = sqrt[(sum[i = 1 to n - r](x_i^2) - (n - r)u^2) / (n-r))]

So all you need is: (sum[i = 1 to n - r](x_i^2)

sum [i = 1 to n] x_i = sum[1 = 1 to n -r] x_i + v
Where: v = x_n-r + x_n-(r-1) + x_n-(r-2)+...+ x_n

You can solve for v, since the other two quantities are known.

Initial s.d. = sqrt[(sum[i = 1 to n](x_i^2) - (n)u_0^2) / (n))]

You're given n, u_0, and initial s.d. so you can solve for: sum[i = 1 to n](x_i^2)

(sum[i = 1 to n - r](x_i^2) + (x_n-r)^2 + (x_n-(r+1))^2 + ...+ (x_n)^2 = sum[i = 1 to n](x_i^2)

Since, v = x_n-r + x_n-(r-1) + x_n-(r-2)+...+ x_n (and you know the value for v)

the problem now just involves figuring out:

(x_n-r)^2 + (x_n-(r-1))^2 + (x_n-(r-2))^2 +...+ (x_n)^2

If you can find that then you can find the new standard deviation, or perhaps there might be another trick which makes the problem easier.

This means that if one value is removed (and you're not given the value) then it'd just be v^2 and that you can solve for the new standard deviation. If two values are removed and you know:

v = x_n-1 + x_n, and you have a numeric value for v.

Then it comes done to solving:

x_n-1^2 + x_n^2

which means you somehow need to account for a 2(x_n-1 * x_n) when you square v.

and it gets harder to solve as r gets bigger.
 
Last edited:
Your answer is no.
To get new sd uniquely you necessarily require the removed group sd along with sizes and means of two groups out of the three groups (old, removed, new).
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K