Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Transformed Standard Deviation?

  1. Jan 24, 2013 #1
    I realize you can transform the data in many ways, but I really can't find a method to solve this particular scenario anywhere online:

    Suppose you are given the mean and standard deviation of a set of data. Further suppose you take away (get rid of) "x" elements in the original data set, and now you're given the new mean. Is it possible to find the new standard deviation? It seems like it's something very simple, but I can't seem to come up with a solution. If it is indeed possible, how would I go about doing this?

    Numbers aren't important, but perhaps a numerical example to illustrate what I'm asking:

    A set of data has a mean of 20 and a standard deviation of 2. Suppose you take four elements from the set of data, and the new mean is now 50. What is the new standard deviation?
  2. jcsd
  3. Jan 25, 2013 #2
    Why don't you just calculate the standard deviation of the new dataset?
    Do you mean that you don't have the dataset; only the new mean? I would strongly suspect that the question is ill-posed, then. I could likely remove points in multiple ways to create the same mean but different SD's.
  4. Jan 25, 2013 #3
    Yes sorry for not clarifying that earlier. I only have the mean and s.d. of the set, I don't know the size of the set or individual points. The question is trivial if I have the data set. Does that mean there is not enough information to infer the new standard deviation of the new data set?

    I'm not sure how would you go about removing points in multiple ways to get the same mean but different standard deviations, given that the original data has a particular mean and standard deviation already. For instance, in the example I gave there is a big jump in the mean after removing four points => 20 to 50, but the deviation of the original set is 2, is it really possible to alter the mean so drastically in multiple ways? What is there preventing the existence of a unique way?

    But I'm just interested in the "new standard deviation", if possible to compute.

    Edit: Upon further reflection, I don't think there is a systematic way in general and my example was flawed.

    It implied:
    20(n + 4) - s = 50n
    Where n + 4 = total number of elements initially, and s = sum of elements removed:

    80 = 30n + s
    Only positive integer solution for n is n = 1 (s = 50) and n = 2 (s = 20).

    If n+4 = 5 and the mean is 20, and one element is 50, then I don't think it's possible to have a s.d. of 2. I'm assuming it's likewise for the n = 2, so example was flawed.
    Last edited: Jan 25, 2013
  5. Jan 31, 2013 #4
    I don't see how to do it. It would be a freakish situation, and seems impossible to me, so maybe there is some freakish way to go about it.
  6. Jan 31, 2013 #5
    Do you know the values of of the points that were removed?
  7. Jan 31, 2013 #6
    Nope, only the number of points removed. If you have the values then it is indeed possible to solve.

    Assuming the "example" is plausible (i.e. it is possible to have a n s.t. the given mean and standard deviation are attainable), it seems that you're only able to determine the new standard deviation if only 1 value is removed.

    =================(my workings, if anyone is interested)================
    (I apologize for being incompetent in LaTex, so the work might be difficult to read).

    r = number of values removed, where r < n.

    initial mean = (1 / n) sum [i = 1 to n] x_i
    new mean = 1 / (n - r) * sum[i = 1 to n -r] x_i

    You're given both quantities, as well as r, so it is possible to solve for sum[i = 1 to n -r] x_i.
    Let new mean = u
    Let old mean = u_0

    new s.d. = sqrt [(sum[i = 1 to n - r] (x_i + u)^2 / (n - r))]

    Expand the exponent and simplifying gives:

    new s.d. = sqrt[(sum[i = 1 to n - r](x_i^2) - (n - r)u^2) / (n-r))]

    So all you need is: (sum[i = 1 to n - r](x_i^2)

    sum [i = 1 to n] x_i = sum[1 = 1 to n -r] x_i + v
    Where: v = x_n-r + x_n-(r-1) + x_n-(r-2)+....+ x_n

    You can solve for v, since the other two quantities are known.

    Initial s.d. = sqrt[(sum[i = 1 to n](x_i^2) - (n)u_0^2) / (n))]

    You're given n, u_0, and initial s.d. so you can solve for: sum[i = 1 to n](x_i^2)

    (sum[i = 1 to n - r](x_i^2) + (x_n-r)^2 + (x_n-(r+1))^2 + ...+ (x_n)^2 = sum[i = 1 to n](x_i^2)

    Since, v = x_n-r + x_n-(r-1) + x_n-(r-2)+....+ x_n (and you know the value for v)

    the problem now just involves figuring out:

    (x_n-r)^2 + (x_n-(r-1))^2 + (x_n-(r-2))^2 +....+ (x_n)^2

    If you can find that then you can find the new standard deviation, or perhaps there might be another trick which makes the problem easier.

    This means that if one value is removed (and you're not given the value) then it'd just be v^2 and that you can solve for the new standard deviation. If two values are removed and you know:

    v = x_n-1 + x_n, and you have a numeric value for v.

    Then it comes done to solving:

    x_n-1^2 + x_n^2

    which means you somehow need to account for a 2(x_n-1 * x_n) when you square v.

    and it gets harder to solve as r gets bigger.
    Last edited: Jan 31, 2013
  8. Feb 15, 2013 #7


    User Avatar

    Your answer is no.
    To get new sd uniquely you necessarily require the removed group sd along with sizes and means of two groups out of the three groups (old, removed, new).
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook