Deviation (specific number)

onako · Feb 20, 2011

Given a set of random real numbers G=(a, b, c, d, e, f, ...), I am supposed to calculate the number T such that the sum of squared errors between that
number and all the elements in G is mimimum. Intuitively, this should be the average of the numbers in G, but I am not sure how to proceed with the proof (perhaps I am missing something important here).

Would the same answer hold for the question: compute a number T such that the deviation of all the elements in G with respect to T is mimimum. (The standard deviation is with respect to the mean value, but is there some other value T such that the deviation with respecto to it is mimimum (smaller that standard deviation and all others))
In addition, how would the conclusions relate to the median value. Again, intuitively, this would match the number T with the closest value in G.

Your opinion on this is highly appreciated. Thanks

statdad · Feb 20, 2011

I'm assuming your set G is finite with n elements. Your problem is to select T so that

[tex]
S(T) = \sum_{x \in G} (x-T)^2
[/tex]

is minimized as a function of T. You have (at least) three ways to proceed.

a) S(T) is a quadratic function in T.

[tex]
S(T) = \sum_{x \in G} (x^2 - 2xT + T^2) = \sum_{x \in G} x^2 -2T \sum_{x \in G} x + n T^2
[/tex]

By definition the vertex of G gives the minimum value, so use -b/(2a) to find the first coordinate of the vertex

b) S(T) is a differentiable function of T (it's a polynomial);find the derivative, set equal to zero, and solve for T

c) Note that

[tex]
S(T) = \sum_{x \in G} (x-T)^2 = \sum_{x \in G} (x - \overline x + \overline x - T)^2 = \sum_{x \in G} (x-\overline x)^2 + 2\sum_{x \in G} (x-\overline x)(\overline x - T) + \sum_{x \in G} (\overline x - T)^2
[/tex]

You can show the middle term is zero; this means that S(T) is the sum of two non-negative
quantities. How can you pick T to make the the sum of those pieces as small as possible?

"Would the same answer hold for the question: compute a number T such that the deviation of all the elements in G with respect to T is mimimum. (The standard deviation is with respect to the mean value, but is there some other value T such that the deviation with respecto to it is mimimum (smaller that standard deviation and all others)) "
No - as long as you measure mean squared error the mean is the minimizing value.

If you choose to minimize

[tex]
\sum_{x \in G} |x - T|
[/tex]

the solution is T = median of numbers.

onako · Feb 20, 2011

Many thanks for your message.

for c)
minimization of the expression assumes exclusion of the third term, which is set to 0 with T equal to mean value. This means that the mean IS actually the minimizer of the sum of squared errors.

However, I wonder about "compute a number T such that the deviation of all the elements in G with respect to T is mimimum."

statdad · Feb 20, 2011

Actually all three of the approaches I mentioned lead to the conclusion that the choice T = sample mean will minimize g(T).

"However, I wonder about "compute a number T such that the deviation of all the elements in G with respect to T is mimimum."

Not sure what you mean here - that's what the work we've discussed does.

onako · Feb 21, 2011

Ok. It means S(T) = sigma^2*N, sigma denoting the standard deviation and N the number of elements in G. What does it say about two sets G1 and G2, having the same standard deviation and same number of elements; they would have same S(T1) and S(T2), although their means might be different?
In other words, given two sets G1 and G2 of equal length, what condition needs be satisfied for S(T1)=S(T2); they would need to have same standard deviation, or same mean?

Thanks

onako · Feb 21, 2011

I tried to incorporate the effect of different factors depending on x entries, say f(x)=x^p

[tex]
S(T) = \sum_{x \in G} f(x)(x-T)^2
[/tex]

The individual effect is, for example, [tex] x_1^p(x_1-T)^2[/tex]. If I'm doing it correctly, not the minimum is the weighted average:

T = (f(x1)x1 + f(x2)x2 + ... + f(xn)xn ) / (f(x1) + f(x2) + ... + f(xn))

Please correct me if I'm wrong.

onako · Feb 22, 2011

I tried to incorporate the effect of different factors depending on x entries, say f(x)=x^p

[tex]
S(T) = \sum_{x \in G} f(x)(x-T)^2
[/tex]

The individual effect is, for example, [tex] x_1^p(x_1-T)^2[/tex]. If I'm doing it correctly, the minimum is the weighted average:

T = (f(x1)x1 + f(x2)x2 + ... + f(xn)xn ) / (f(x1) + f(x2) + ... + f(xn))

Please correct me if I'm wrong.

Deviation (specific number)

What is deviation?

How is deviation calculated?

What is the difference between standard deviation and variance?

Why is deviation important in statistics?

What does a high or low deviation indicate about the data?

Similar threads

Hot Threads

Recent Insights