1. Limited time only! Sign up for a free 30min personal tutor trial with Chegg Tutors
    Dismiss Notice
Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Mean and Variance of a data set

Tags:
  1. Aug 27, 2016 #1
    1. The problem statement, all variables and given/known data
    In this problem we will be generating and analyzing lists of normally distributed random numbers. The distribution we are sampling has true mean 0 and standard deviation 1.

    1. If we sample this distribution N=5 times, what do we expect the mean to be? How about the standard deviation? Whats the error on the mean?

    2. Relevant equations

    [itex] \bar{x}= \Sigma \frac{x_i}{N} [/itex]

    [itex] s^2= \Sigma \frac{(x_i- \mu)^2}{N}[/itex]

    3. The attempt at a solution

    Im not sure where to go here. What does it mean to have a true mean of zero? What is meant by "true" mean - I havent seen this this phrase used before. I read that if a data distribution is approximately normal then about 68 percent of the data values are within one standard deviation of the mean, but how does this help me when I want to sample this distribution? Any help would be appreciated! I have never taken a statistics class.
     
  2. jcsd
  3. Aug 27, 2016 #2

    Ray Vickson

    User Avatar
    Science Advisor
    Homework Helper

    Suppose one run of your experiment consists of taking a random sample of size N = 5 from a standard normal distribution (mean = 0, variance = 1). In any run of your experiment, the computed mean of your data set is ##\bar{x} = \frac{1}{5}(x_1 + x_2 + x_3 + x_4 + x_5)##, where the ##x_i## constitute your sample of 5 numbers. Note that ##\bar{x}## is itself a sample point from a random variable ##\bar{X}##: in one experiment it might = 1.7, in another experiment it might = -0.83, etc., etc. So, ##\bar{X}## itself has some true mean and some true variance; these would be well approximated by repeating the experiment 100,000 times and taking the average and sample variance of your 100,000 ##\bar{x}## values. Remember, however, that for any particular experiment the computed ##\bar{x}## and the computed sample variance ##s^2(x)## will very likely differ at least a bit from the true values of 0 and 1 respectively.

    BTW: Your formula
    $$s^2= \frac{1}{N} \sum_{i=1}^N (x_i- \mu)^2)$$
    is correct only if you pretend you know ##\mu##; it is NOT what we usually call the "sample variance". The usual definition of sample variance is that we also estimate ##\mu## from the data as well, so we are dealing with
    $$\text{sample variance} = \frac{1}{N-1} \sum_{i=1}^N (x_i - \bar{x})^2, $$
    where
    $$ \bar{x} = \frac{1}{N} \sum_{i=1}^N x_i .$$
    Note that we divide by ##N-1## instead of ##N##; the need for doing that arises because we have already "used up" one piece of information when we computed ##\bar{x}##, so have left only ##N-1## extra pieces of information that can be used when estimating variance. Theoretically, the true mean of the random variable
    $$S = \frac{1}{N-1} \sum_{i=1}^N (x_i -\bar{x})^2$$
    is 1, which is the true value of the variance. Had we divided by N instead we would have a random variable with mean ##(N-1)/N = 1 - (1/N)##, instead of the true value 1. Of course, for large ##N## it makes hardly any noticeable difference.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted