Variations of the Variance Formula

1. Apr 7, 2013

CraigH

The variance is denoted by $σ^{2}$

It is calculated with this equation:

$σ^{2}=\frac{\sum^{N}_{i=1}(Xi-μ)^{2}}{N}$

Which makes sense. To calculate the average (deviation from the mean)^2 you need to sum up the (deviations from the mean)^2 and then divided by the number of deviations.
The reason that you use (deviation from the mean)^2 instead of just the deviation from the mean is so that positive and negative numbers do not cancel out.

However in my lecture slides the formula is given by:

$σ^{2}=\sum^{N}_{i=1}(Xi-μ)^{2}P(Xi)$

Which is different from what I have previously learnt.

All I can presume is that $\sum^{N}_{i=1} P(Xi) = \frac{1}{N}$

But why? What is P(Xi), and why is it used instead of just using N?

On the previous slide there is an equation for P(X) which I presume is something to do with it.

This equation is:

$P(X)=\frac{1}{σ\sqrt{2∏}} *e^{-\frac{(x-μ)^2}{2σ^2}}$

https://en.wikipedia.org/wiki/Normal_distribution

Although this doesn't make sense anymore. I used to be able to understand standard deviation and variance as the equations were quite intuitive, but now it makes no sense at all.
It would probably help if I understood the normal distribution (Gaussian Distribution) equation. What does this equation mean? And why is it used in the calculation of variance?

Thank you!!

Last edited: Apr 7, 2013
2. Apr 7, 2013

Stephen Tashi

You aren't distinguishing among the different meanings of "variance". In statistics, we have "sample variance", "variance of a probability distribution" and there are also several different "estimators of the population variance". Do you understand the definitions of those things?

3. Apr 8, 2013

CraigH

I'm afraid not. Although in my course I'm pretty sure whenever variance is mentioned it just the average squared deviation from the mean for a set of results. Different meanings to the term might be taught next year. We only had 3 lectures on statistics this year.

I just want to know why I was previously taught to use:

$σ^{2}=\frac{\sum^{N}_{i=1}(Xi-μ)^{2}}{N}$

Which makes sense. And now I have to use:

$σ^{2}=\sum^{N}_{i=1}(Xi-μ)^{2}P(Xi)$

Which doesn't make sense.

Last edited: Apr 8, 2013
4. Apr 8, 2013

Stephen Tashi

You should learn the basic scenarios that are treated by statistics. Your impression of what "variance" is shows you haven't grasped the big picture.

You didn't define what the $\mu$ and the $X_i$ represent. That looks like a forumula for the "sample variance" of $N$ realizations of a random variable. In this formula the $X_i$ are particular data in a sample and $\mu$ is the sample mean.

Some books define the "sample variance" by the formula $σ^{2}=\frac{\sum^{N}_{i=1}(Xi-μ)^{2}}{N-1}$

Another interpretation of the formula is that it is an estimator of the population variance. There are various different formulas for making such an estimate.

That looks like formula for the variance of a discrete probability distribution. In this formula, the $X_i$ are possible values of the random variable. The random variable can take on $N$ possible values $X_i$ with corresponding probability $P(X_i)$.

It could also be interpreted as a formula for the sample variance where $X_i$ are the distinct values in a sample of data and $P(X_i)$ is the fraction of times they occurred.

The following is a common setting for doing statistics.

There is a random variable $X$. It has distribution, which we often don't know. We usually assume the distirbution comes from a certain "family" of distributions. A particular member of the family is particular function ( think of it as the particular cumulative distribution or probability density function, whichever you prefer). Knowing the value of a few "parameters" of the function (such as the mean and variance) narrows it down to one particular function. (e.g. the parameter p = probability of success = 0.3 and the number of trials N = 20 define a particular binomial distribution. Likewise a particular mean $\mu =$ 0.3 and particular variance $\sigma^2 = 2.6$ specify a particular normal distribution in the family of normal distributions. )

The mean and variance of a particular distribution are, by definition, computed from formulas that involve the possible values of the random variable and their associated probabilities (or probability density functions, in the case of continuous random variables). These formulas do not involve variables representing data from samples of the random variable.

When we take a sample of data, we can define a mean and variance of the sample data. The formulas for the mean and variance of the sample data do not involve the probabilities given by the probability distribution of the random variable. They only involves the values we obtained in the sample.

The mean of a sample from a distribution need not equal the mean of the distribution of the random variable being sampled. The mean of the sample depends on randomly chosen sample values, so the mean of the sample can be regarded as a random variable itself. (We don't say things like "the binomial random variable is 6", but even statisticians say things like "the sample mean is 0.4" referring to "sample mean" as a specific number. Technically, we should say "the realization of the sample mean is 0.4" )

The probability distribution of the sample mean is related to the probability distribution of the random variable that we sample, but it need not have the same distribution. Understanding this is a great stumbling block for students. Since the sample mean has its own probability distribution, it has its own mean and variance. (In the typical senario for sampling, which is to take indpendent random samples) the "mean of the sample mean" will be the same as the mean of the variable being sampled, but the "variance of the sample mean" will not be the same as the variance of the random variable being sampled.)

Similar remarks hold for the sample variance, which is a random variable and has its own mean and its own variance.

The two common tasks in statistics are "hypothesis testing" and "estimation".

Estimation is the task of estimating the parameters of the distribution of a random variable by doing computations on the values obtained in a sample. (e.g. A typical task in estimation is to estimate the mean of a random variable using values obtained in a sample.) In the early days of statistics, it was considered a virtue that the formula for an estimator of a parameter of a distribution (like the mean) resemble the formula that defines the parameter. The two formulas can't be identical because the estimator uses values from the sample and the definition of the parameter uses values from a probability distribution. One way to make a correspondence is to say that the "fraction of times a value is observed in a sample" will correspond to the "probability of a value occurring given by the probability distribution". For example, the formula for the sample mean can be written as $\sum_{i=1}^k V_k f_k$ where the $V_k$ are the distinct values that occur in the sample and the $f_k$ are the fraction of the total measurements that give that value. This formula gives the same result as the familiar $\frac{\sum_{i=1}^N X_i}{N}$ where the $X_i$ are not necessarily distinct values from each other.

From the modern point of view, it is not essential that the formula for an estimator resemble the formula for the parameter being estimated. However, the formulas for estimators obtained from the old fashioned way of thinking are useful and so the formula for the "sample ..such and such" is often the same as the "corresponding" formula for the "distribution ...such and such" that would be estimated. For example, the formula for the "sample mean" amounts to the formula for the "estimator of the mean of the distribution" that "corresponds" to the formula that defines the mean of the distribution.

5. Apr 8, 2013

ssd

Please check for the ambiguity of yours in the similar case of 'mean' and 'expectation'. The first one is a descriptive measure (working with a finite number of cases as in a sample of finite size). The second is an inferential measure (working with case when the experiment is repeated infinite number of times, that is the population is considered).

You can visualize it is as f/n → p when n is large enough (weak convergence here).

PS:

1/ Ʃp ≠ 1/N, as you wrote, but Ʃp=1.
2/ In your first expression, the general form of the multiplier is f/N , where Ʃf=N.
3/ The P(X) in case of normal distribution (it is pdf) is not the same P(X) as in the first
case (it is probability).

IMO this post should go to homework section.

Last edited: Apr 8, 2013
6. Apr 8, 2013

CraigH

Wow! Thank you stephen for your very in depth answer, it has helped a lot. I'm not going to lie though, I still am quite confused about this subject. like you said, I haven't grasped the big picture. The reason is partly because I haven't studied statistics in years. My college decided to teach "decision maths" instead of statistics, so i'm quite behind the rest of my class.
ssd I'm not sure what f is. I'm guessing its the sample, and p is the population?

I'm going to go right back to the basics. I'm going through the Khan academy stats lessons now. I'll probably come back to this thread once I have caught up and read your answers again.

Thank you for the help!

Last edited: Apr 8, 2013
7. Apr 9, 2013

ssd

f is frequency, p is probability. These notations are quite common.