# Mean and Variance of a data set

## Homework Statement

In this problem we will be generating and analyzing lists of normally distributed random numbers. The distribution we are sampling has true mean 0 and standard deviation 1.

1. If we sample this distribution N=5 times, what do we expect the mean to be? How about the standard deviation? Whats the error on the mean?

## Homework Equations

$\bar{x}= \Sigma \frac{x_i}{N}$

$s^2= \Sigma \frac{(x_i- \mu)^2}{N}$

## The Attempt at a Solution

Im not sure where to go here. What does it mean to have a true mean of zero? What is meant by "true" mean - I havent seen this this phrase used before. I read that if a data distribution is approximately normal then about 68 percent of the data values are within one standard deviation of the mean, but how does this help me when I want to sample this distribution? Any help would be appreciated! I have never taken a statistics class.

Ray Vickson
Science Advisor
Homework Helper
Dearly Missed

## Homework Statement

In this problem we will be generating and analyzing lists of normally distributed random numbers. The distribution we are sampling has true mean 0 and standard deviation 1.

1. If we sample this distribution N=5 times, what do we expect the mean to be? How about the standard deviation? Whats the error on the mean?

## Homework Equations

$\bar{x}= \Sigma \frac{x_i}{N}$

$s^2= \Sigma \frac{(x_i- \mu)^2}{N}$

## The Attempt at a Solution

Im not sure where to go here. What does it mean to have a true mean of zero? What is meant by "true" mean - I havent seen this this phrase used before. I read that if a data distribution is approximately normal then about 68 percent of the data values are within one standard deviation of the mean, but how does this help me when I want to sample this distribution? Any help would be appreciated! I have never taken a statistics class.

Suppose one run of your experiment consists of taking a random sample of size N = 5 from a standard normal distribution (mean = 0, variance = 1). In any run of your experiment, the computed mean of your data set is ##\bar{x} = \frac{1}{5}(x_1 + x_2 + x_3 + x_4 + x_5)##, where the ##x_i## constitute your sample of 5 numbers. Note that ##\bar{x}## is itself a sample point from a random variable ##\bar{X}##: in one experiment it might = 1.7, in another experiment it might = -0.83, etc., etc. So, ##\bar{X}## itself has some true mean and some true variance; these would be well approximated by repeating the experiment 100,000 times and taking the average and sample variance of your 100,000 ##\bar{x}## values. Remember, however, that for any particular experiment the computed ##\bar{x}## and the computed sample variance ##s^2(x)## will very likely differ at least a bit from the true values of 0 and 1 respectively.

BTW: Your formula
$$s^2= \frac{1}{N} \sum_{i=1}^N (x_i- \mu)^2)$$
is correct only if you pretend you know ##\mu##; it is NOT what we usually call the "sample variance". The usual definition of sample variance is that we also estimate ##\mu## from the data as well, so we are dealing with
$$\text{sample variance} = \frac{1}{N-1} \sum_{i=1}^N (x_i - \bar{x})^2,$$
where
$$\bar{x} = \frac{1}{N} \sum_{i=1}^N x_i .$$
Note that we divide by ##N-1## instead of ##N##; the need for doing that arises because we have already "used up" one piece of information when we computed ##\bar{x}##, so have left only ##N-1## extra pieces of information that can be used when estimating variance. Theoretically, the true mean of the random variable
$$S = \frac{1}{N-1} \sum_{i=1}^N (x_i -\bar{x})^2$$
is 1, which is the true value of the variance. Had we divided by N instead we would have a random variable with mean ##(N-1)/N = 1 - (1/N)##, instead of the true value 1. Of course, for large ##N## it makes hardly any noticeable difference.