# Statistics and Tchebysheff's theorum

1. Sep 6, 2011

### major_maths

1. The problem statement, all variables and given/known data

Let k$\geq$1. Show that, for any set of n measurements, the fraction included in the interval $\overline{y}$-ks to $\overline{y}$+ks is at least (1-1/k2).

[Hint: s2 = 1/(n-1)[$\sum$(yi-$\overline{y}$)2]. In this expression, replace all deviations for which the absolute value of (yi-$\overline{y}$)$\geq$ks with ks. Simplify.] This result is known as Tchebysheff's theorem.

2. Relevant equations are the above.

3. The attempt at a solution

I've got no clue what the problem wants, much less how to start a solution.

2. Sep 7, 2011

### Stephen Tashi

Let there be $M$ measurements where $| y_i - \overline{y}| \geq ks$
If in the sum $\sum(y_i -\overline{y})^2$ we replace those $M$ measurements by $ks$ and leave out the other $N-M$ measurements, we get a smaller sum. The smaller sum is $M (ks)^2$

Hence

$$s^2 = \frac{1}{n-1} \sum(y_i - \overline{y})^2 \geq \frac{1}{n-1} M (ks)^2$$

Since $\frac{1}{n-1} > \frac{1}{n}$

$$s^2 \geq \frac{1}{n-1}M(ks)^2 > \frac{1}{n}M(ks)^2$$
$$s^2 \geq \frac{1}{n}M(ks)^2$$

The "fraction of measurements" that $M$ constitutes is $\frac{M}{n}$ and the above inequality can be used to bound it.

The original problem concerns the fraction of measurements other than those M measurements, so that fraction is $1.0 - \frac{M}{n}$.
That needs to be bounded by using the bound for $\frac{M}{n}$.

3. Sep 7, 2011

### hassman

Thank you Stephen. That was part of my homework I was struggling with. I wonder which school OP goes :-).

To be really pedantic, should not the last equation have > sign instead of >=?

Last edited: Sep 7, 2011
4. May 13, 2012

### BigBrain

When we take a look at the definition of theorem number two, we see that the theorem refers to the standard deviation of the possible sample means computed from all possible random samples. Theorem number one is similar in that it says for any population, the average value of all possible sample means computed from all possible random samples of a given size from the population equal the population mean. What does that mean? Does that mean that the mean of my sample will automatically be equal to the population mean?

Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook