Mean and Variance of a data set

In summary, the problem involves generating and analyzing lists of normally distributed random numbers with a true mean of 0 and standard deviation of 1. If we sample this distribution N=5 times, we can expect the mean to be close to 0 and the standard deviation to be close to 1. However, the computed mean and sample variance may differ slightly from the true values due to the random nature of the experiment. It is important to use the correct formula for sample variance, which involves dividing by N-1 instead of N.
  • #1
DeldotB
117
7

Homework Statement


In this problem we will be generating and analyzing lists of normally distributed random numbers. The distribution we are sampling has true mean 0 and standard deviation 1.

  1. If we sample this distribution N=5 times, what do we expect the mean to be? How about the standard deviation? Whats the error on the mean?

Homework Equations



[itex] \bar{x}= \Sigma \frac{x_i}{N} [/itex]

[itex] s^2= \Sigma \frac{(x_i- \mu)^2}{N}[/itex]

The Attempt at a Solution



Im not sure where to go here. What does it mean to have a true mean of zero? What is meant by "true" mean - I haven't seen this this phrase used before. I read that if a data distribution is approximately normal then about 68 percent of the data values are within one standard deviation of the mean, but how does this help me when I want to sample this distribution? Any help would be appreciated! I have never taken a statistics class.
 
Physics news on Phys.org
  • #2
DeldotB said:

Homework Statement


In this problem we will be generating and analyzing lists of normally distributed random numbers. The distribution we are sampling has true mean 0 and standard deviation 1.

  1. If we sample this distribution N=5 times, what do we expect the mean to be? How about the standard deviation? Whats the error on the mean?

Homework Equations



[itex] \bar{x}= \Sigma \frac{x_i}{N} [/itex]

[itex] s^2= \Sigma \frac{(x_i- \mu)^2}{N}[/itex]

The Attempt at a Solution



Im not sure where to go here. What does it mean to have a true mean of zero? What is meant by "true" mean - I haven't seen this this phrase used before. I read that if a data distribution is approximately normal then about 68 percent of the data values are within one standard deviation of the mean, but how does this help me when I want to sample this distribution? Any help would be appreciated! I have never taken a statistics class.

Suppose one run of your experiment consists of taking a random sample of size N = 5 from a standard normal distribution (mean = 0, variance = 1). In any run of your experiment, the computed mean of your data set is ##\bar{x} = \frac{1}{5}(x_1 + x_2 + x_3 + x_4 + x_5)##, where the ##x_i## constitute your sample of 5 numbers. Note that ##\bar{x}## is itself a sample point from a random variable ##\bar{X}##: in one experiment it might = 1.7, in another experiment it might = -0.83, etc., etc. So, ##\bar{X}## itself has some true mean and some true variance; these would be well approximated by repeating the experiment 100,000 times and taking the average and sample variance of your 100,000 ##\bar{x}## values. Remember, however, that for any particular experiment the computed ##\bar{x}## and the computed sample variance ##s^2(x)## will very likely differ at least a bit from the true values of 0 and 1 respectively.

BTW: Your formula
$$s^2= \frac{1}{N} \sum_{i=1}^N (x_i- \mu)^2)$$
is correct only if you pretend you know ##\mu##; it is NOT what we usually call the "sample variance". The usual definition of sample variance is that we also estimate ##\mu## from the data as well, so we are dealing with
$$\text{sample variance} = \frac{1}{N-1} \sum_{i=1}^N (x_i - \bar{x})^2, $$
where
$$ \bar{x} = \frac{1}{N} \sum_{i=1}^N x_i .$$
Note that we divide by ##N-1## instead of ##N##; the need for doing that arises because we have already "used up" one piece of information when we computed ##\bar{x}##, so have left only ##N-1## extra pieces of information that can be used when estimating variance. Theoretically, the true mean of the random variable
$$S = \frac{1}{N-1} \sum_{i=1}^N (x_i -\bar{x})^2$$
is 1, which is the true value of the variance. Had we divided by N instead we would have a random variable with mean ##(N-1)/N = 1 - (1/N)##, instead of the true value 1. Of course, for large ##N## it makes hardly any noticeable difference.
 

1. What is the definition of mean and variance in a data set?

The mean of a data set is the average value of all the numbers in the set. It is calculated by adding all the numbers in the set and then dividing by the total number of values in the set. Variance measures how spread out the numbers in a data set are from the mean. It is calculated by finding the difference between each number and the mean, squaring those differences, and then finding the average of those squared differences.

2. Why is it important to calculate the mean and variance of a data set?

The mean and variance provide important information about the distribution of the data set. They help us understand the central tendency and variability of the data. This information can be used to make predictions and draw conclusions in statistical analysis.

3. How can the mean and variance be affected by outliers in a data set?

Outliers are extreme values in a data set that can significantly affect the mean and variance. If there is an outlier with a very high or low value, it can pull the mean in that direction and increase the variance. Therefore, it is important to identify and handle outliers carefully when calculating the mean and variance.

4. What is the difference between population mean and sample mean?

The population mean is the average value of a variable in the entire population, while the sample mean is the average value of a variable in a subset of the population (i.e. a sample). The sample mean is used to estimate the population mean and is often denoted by x̄ (pronounced "x bar"). The larger the sample size, the more accurate the sample mean will be as an estimate of the population mean.

5. How do you interpret the variance value in a data set?

The variance value represents the average squared difference between each data point and the mean. A higher variance indicates that the data points are more spread out from the mean, while a lower variance suggests that the data points are closer to the mean. In other words, the larger the variance, the more diverse the data set is, and the smaller the variance, the more similar the data points are to each other.

Similar threads

  • Precalculus Mathematics Homework Help
Replies
14
Views
1K
  • Precalculus Mathematics Homework Help
Replies
1
Views
583
Replies
1
Views
757
  • Precalculus Mathematics Homework Help
Replies
10
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
931
  • Precalculus Mathematics Homework Help
Replies
6
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
28
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Precalculus Mathematics Homework Help
Replies
7
Views
5K
Back
Top