JFS321 said:
All,
https://jimgrange.wordpress.com/2015/12/05/statistics-tables-where-do-the-numbers-come-from/
This is a great post -- but I'm a little foggy on the sentence that says "...mean and standard deviation for each condition is fixed at 0 and 1." Can someone explain this in a slightly different way? How do these values relate to the actual experimental values (which would be able to take on an infinite number of values)?
Many, many thanks.
If you have a sample ##X_1, X_2, \ldots, X_n## of independent normal random variables, all with the same (unknown) mean ##\mu## and variance ##\sigma^2##, the sample mean
$$\bar{X} = \frac{1}{n} (X_1 + X_2 + \cdots + X_n)$$
has mean ##\mu## and variance ##\sigma^2/n##, or standard deviation ##\sigma/\sqrt{n}##. Thus, the random variable
$$ Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \hspace{2cm}(1)$$
is a standard normal random variable, with
mean 0 and variance 1. However, we do not know ##\mu## or ##\sigma.## We can estimate ##\sigma## using the sample standard-deviation ##s_n,## where
$$s_n = \sqrt{ \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2 } \hspace{2cm}(2)$$
If we substitute ##s_n## from (2) in place of ##\sigma## in (1) we obtain a new random variable
$$T_{n-1} = \frac{\bar{X} - \mu}{s_n / \sqrt{n}} \hspace{2cm}(3) $$
Here, we label ##T## with the index ##n-1## because the variance estimate ##s## in (2) has essentially used ##n-1## independent pieces of data to calculate ##s_n##. We say that ##n-1## is the number of "degrees of freedom".
This new random variable ##T_{n-1}## is not normally distributed anymore, but it has a distribution that can be calculated explicitly and --- as demonstrated in your cited link --- can be approximated through Monte-Carlo simulation methods.
The random variable ##T_{n-1}## has a symmetric distribution, so in a table it is enough to give values of ##t_\alpha(n-1)## that yield
$$P(T_{n-1} > t_\alpha(n-1)) = \alpha,$$
and typical tables do this for ##\alpha =0.10, 0.05, 0.02, 0.01## and maybe some others. Modern software such as EXCEL or free on-line sites such as Wolfram Alpha can give you values of ##t_\alpha(n-1)## for any specified ##n## and ##\alpha.##
You may notice that in (3) we still do not know ##\mu##, but that is OK: we use our t-table to make inferences about plausible values of ##\mu##. The reason we can do this is because we can re-write (3) as
$$\bar{X} - \mu = (s_n/\sqrt{n})\, T_{N-1} \Longrightarrow \mu = \bar{X} - (s_n/\sqrt{n})\, T_{n-1} \hspace{1cm}(4)$$
The probability that ##T_{n-1}## lies between ##-t_\alpha(n-1)## and ##+t_\alpha(n-1)## is ##1 - 2 \alpha##, so we can use (4) to construct a ##1-2\alpha## confidence interval for ##\mu## (meaning an interval that will contain the true value of ##\mu## in ##(1-2\alpha) \times 100 \%## of the cases.