Mean and Variance of a data set

Click For Summary
The discussion focuses on understanding the mean and variance of a sample drawn from a standard normal distribution with a true mean of 0 and variance of 1. When sampling N=5, the expected sample mean, denoted as \(\bar{x}\), will vary with each experiment, reflecting the randomness inherent in sampling. The sample variance formula requires adjustment, using \(N-1\) in the denominator to account for the estimation of the mean from the same data set, which ensures an unbiased estimate of the true variance. The distinction between the true mean and sample mean is emphasized, highlighting that individual samples may differ from the expected values. This foundational understanding is crucial for analyzing normally distributed data effectively.
DeldotB
Messages
117
Reaction score
8

Homework Statement


In this problem we will be generating and analyzing lists of normally distributed random numbers. The distribution we are sampling has true mean 0 and standard deviation 1.

  1. If we sample this distribution N=5 times, what do we expect the mean to be? How about the standard deviation? Whats the error on the mean?

Homework Equations



\bar{x}= \Sigma \frac{x_i}{N}

s^2= \Sigma \frac{(x_i- \mu)^2}{N}

The Attempt at a Solution



Im not sure where to go here. What does it mean to have a true mean of zero? What is meant by "true" mean - I haven't seen this this phrase used before. I read that if a data distribution is approximately normal then about 68 percent of the data values are within one standard deviation of the mean, but how does this help me when I want to sample this distribution? Any help would be appreciated! I have never taken a statistics class.
 
Physics news on Phys.org
DeldotB said:

Homework Statement


In this problem we will be generating and analyzing lists of normally distributed random numbers. The distribution we are sampling has true mean 0 and standard deviation 1.

  1. If we sample this distribution N=5 times, what do we expect the mean to be? How about the standard deviation? Whats the error on the mean?

Homework Equations



\bar{x}= \Sigma \frac{x_i}{N}

s^2= \Sigma \frac{(x_i- \mu)^2}{N}

The Attempt at a Solution



Im not sure where to go here. What does it mean to have a true mean of zero? What is meant by "true" mean - I haven't seen this this phrase used before. I read that if a data distribution is approximately normal then about 68 percent of the data values are within one standard deviation of the mean, but how does this help me when I want to sample this distribution? Any help would be appreciated! I have never taken a statistics class.

Suppose one run of your experiment consists of taking a random sample of size N = 5 from a standard normal distribution (mean = 0, variance = 1). In any run of your experiment, the computed mean of your data set is ##\bar{x} = \frac{1}{5}(x_1 + x_2 + x_3 + x_4 + x_5)##, where the ##x_i## constitute your sample of 5 numbers. Note that ##\bar{x}## is itself a sample point from a random variable ##\bar{X}##: in one experiment it might = 1.7, in another experiment it might = -0.83, etc., etc. So, ##\bar{X}## itself has some true mean and some true variance; these would be well approximated by repeating the experiment 100,000 times and taking the average and sample variance of your 100,000 ##\bar{x}## values. Remember, however, that for any particular experiment the computed ##\bar{x}## and the computed sample variance ##s^2(x)## will very likely differ at least a bit from the true values of 0 and 1 respectively.

BTW: Your formula
$$s^2= \frac{1}{N} \sum_{i=1}^N (x_i- \mu)^2)$$
is correct only if you pretend you know ##\mu##; it is NOT what we usually call the "sample variance". The usual definition of sample variance is that we also estimate ##\mu## from the data as well, so we are dealing with
$$\text{sample variance} = \frac{1}{N-1} \sum_{i=1}^N (x_i - \bar{x})^2, $$
where
$$ \bar{x} = \frac{1}{N} \sum_{i=1}^N x_i .$$
Note that we divide by ##N-1## instead of ##N##; the need for doing that arises because we have already "used up" one piece of information when we computed ##\bar{x}##, so have left only ##N-1## extra pieces of information that can be used when estimating variance. Theoretically, the true mean of the random variable
$$S = \frac{1}{N-1} \sum_{i=1}^N (x_i -\bar{x})^2$$
is 1, which is the true value of the variance. Had we divided by N instead we would have a random variable with mean ##(N-1)/N = 1 - (1/N)##, instead of the true value 1. Of course, for large ##N## it makes hardly any noticeable difference.
 

Similar threads

  • · Replies 42 ·
2
Replies
42
Views
5K
  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
Replies
1
Views
1K
  • · Replies 7 ·
Replies
7
Views
6K
  • · Replies 28 ·
Replies
28
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
2
Views
2K