Mean and Variance of a data set

Click For Summary
SUMMARY

This discussion focuses on the analysis of normally distributed random numbers with a true mean of 0 and a standard deviation of 1. When sampling N=5 times from this distribution, the expected sample mean is represented by the formula 𝖱 = Σ (x_i) / N, while the sample variance is calculated using s² = Σ (x_i - μ)² / N. It is crucial to note that the sample variance should be computed as s² = Σ (x_i - 𝖱)² / (N-1) to account for the estimation of the mean from the sample itself. This ensures that the sample variance accurately reflects the true variance of the population.

PREREQUISITES
  • Understanding of normal distribution concepts
  • Familiarity with statistical formulas for mean and variance
  • Knowledge of sample size implications in statistics
  • Basic grasp of random sampling techniques
NEXT STEPS
  • Learn about the Central Limit Theorem and its implications for sampling distributions
  • Study the differences between population variance and sample variance
  • Explore the concept of confidence intervals for sample means
  • Investigate the effects of sample size on statistical estimates
USEFUL FOR

Students studying statistics, data analysts, researchers conducting experiments with random sampling, and anyone interested in understanding the properties of normal distributions.

DeldotB
Messages
117
Reaction score
8

Homework Statement


In this problem we will be generating and analyzing lists of normally distributed random numbers. The distribution we are sampling has true mean 0 and standard deviation 1.

  1. If we sample this distribution N=5 times, what do we expect the mean to be? How about the standard deviation? Whats the error on the mean?

Homework Equations



\bar{x}= \Sigma \frac{x_i}{N}

s^2= \Sigma \frac{(x_i- \mu)^2}{N}

The Attempt at a Solution



Im not sure where to go here. What does it mean to have a true mean of zero? What is meant by "true" mean - I haven't seen this this phrase used before. I read that if a data distribution is approximately normal then about 68 percent of the data values are within one standard deviation of the mean, but how does this help me when I want to sample this distribution? Any help would be appreciated! I have never taken a statistics class.
 
Physics news on Phys.org
DeldotB said:

Homework Statement


In this problem we will be generating and analyzing lists of normally distributed random numbers. The distribution we are sampling has true mean 0 and standard deviation 1.

  1. If we sample this distribution N=5 times, what do we expect the mean to be? How about the standard deviation? Whats the error on the mean?

Homework Equations



\bar{x}= \Sigma \frac{x_i}{N}

s^2= \Sigma \frac{(x_i- \mu)^2}{N}

The Attempt at a Solution



Im not sure where to go here. What does it mean to have a true mean of zero? What is meant by "true" mean - I haven't seen this this phrase used before. I read that if a data distribution is approximately normal then about 68 percent of the data values are within one standard deviation of the mean, but how does this help me when I want to sample this distribution? Any help would be appreciated! I have never taken a statistics class.

Suppose one run of your experiment consists of taking a random sample of size N = 5 from a standard normal distribution (mean = 0, variance = 1). In any run of your experiment, the computed mean of your data set is ##\bar{x} = \frac{1}{5}(x_1 + x_2 + x_3 + x_4 + x_5)##, where the ##x_i## constitute your sample of 5 numbers. Note that ##\bar{x}## is itself a sample point from a random variable ##\bar{X}##: in one experiment it might = 1.7, in another experiment it might = -0.83, etc., etc. So, ##\bar{X}## itself has some true mean and some true variance; these would be well approximated by repeating the experiment 100,000 times and taking the average and sample variance of your 100,000 ##\bar{x}## values. Remember, however, that for any particular experiment the computed ##\bar{x}## and the computed sample variance ##s^2(x)## will very likely differ at least a bit from the true values of 0 and 1 respectively.

BTW: Your formula
$$s^2= \frac{1}{N} \sum_{i=1}^N (x_i- \mu)^2)$$
is correct only if you pretend you know ##\mu##; it is NOT what we usually call the "sample variance". The usual definition of sample variance is that we also estimate ##\mu## from the data as well, so we are dealing with
$$\text{sample variance} = \frac{1}{N-1} \sum_{i=1}^N (x_i - \bar{x})^2, $$
where
$$ \bar{x} = \frac{1}{N} \sum_{i=1}^N x_i .$$
Note that we divide by ##N-1## instead of ##N##; the need for doing that arises because we have already "used up" one piece of information when we computed ##\bar{x}##, so have left only ##N-1## extra pieces of information that can be used when estimating variance. Theoretically, the true mean of the random variable
$$S = \frac{1}{N-1} \sum_{i=1}^N (x_i -\bar{x})^2$$
is 1, which is the true value of the variance. Had we divided by N instead we would have a random variable with mean ##(N-1)/N = 1 - (1/N)##, instead of the true value 1. Of course, for large ##N## it makes hardly any noticeable difference.
 

Similar threads

  • · Replies 42 ·
2
Replies
42
Views
6K
  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
Replies
1
Views
1K
  • · Replies 7 ·
Replies
7
Views
6K
  • · Replies 28 ·
Replies
28
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
2
Views
2K