T-distribution, standard error - how to interpret

In summary: The standard error is a measure of how much the mean varies. The standard deviation is a measure of how much the data varies. The standard error tells us how much the mean varies from the population mean. The standard deviation tells us how much the data varies from the population standard deviation.
  • #1
Vital
108
4
Hello.
Math is not my strongest part, yet, so I will be very grateful for your help and for giving the answer in sort of "plain" English, meaning that it will be great if you don't go too deep into the rabbit hole using difficult terminology, as these distribution issues are truly difficult for me

Question:
Below I show formulas, whose meanings I would like to understand, i.e. to understand the logic of each part - why we do this or that computation, the logic, intuition and the the result. I will, of course, show how I read them, thus you will see what I don't understand (hopefully).

To compute the confidence interval using t-distribution we use the following formula:

First the formulas and an example:

X ± tα/2 x [s / √n]
where X is the mean, s is the sample standard deviation (as we don't know the population standard devitation), n is the population size

δ = s / √n is the formula for the standard error.

For example, find the 95 percent confidence interval:
X = 3%, s = 6%, n = 10
Then, 3% ± t0.025 x [6% / √10], hence, as t0.025 = 2.262 (I took that from the table), the interval is from 1.64 to 4.36

Questions:

1) What is the meaning and how to read the standard error part s / √n ?
What are we trying to do here? We take standard deviation and divide it by the square root of the size, and thus we "break" the standard deviation into √n to get the value of one that part. What does it give us? And also why the square root?

2) Upon finding that number, we multiply it by the value of t0.025 found for the degree of freedom of n - 1, that is 9. What does this mean?

Thank you very much.
 
Physics news on Phys.org
  • #2
The standard deviation is a measure of how much the data varies. The standard error is a measure of how much the mean varies.

Suppose you have some random variable, say the weight of knee replacement patients, and suppose that it is normally distributed with mean 70 and SD 10. If you take one random patient, then they are probably (95%) between 50 and 90 kg. You can confirm that by repeatedly randomly selecting one patient and recording their weight.

Now, instead of taking one patient, suppose you took a sample of 4 patients and took the mean. The mean of a sample of four patient weights would itself be a random variable with a mean of 70 and a SD of 5. It turns out that this value is important enough that it is given the name “Standard error”. If you take four random patients then their mean weight is probably between 60 and 80. You could confirm this by repeatedly randomly selecting four patients and recording their mean weight.

The larger your sample, the closer the sample mean will be to the population mean. This is why the SE gets smaller as n gets larger. In this example a sample size of 4 gives an SE of 5, a sample size of 25 gives an SE of 2, and a sample size of 100 gives an SE of 1.
 
Last edited:
  • #3
Dale, Thank you very much for your answer. It is very helpful as I finally understood the meaning of the SE concept. Thank you. But I still have some issues I would like to clarify. And also what is very important I was asking about how to interpret (read) the math we are performing in those formulas. Your answer helped me to see why we have the square root of n in the Standard Error formula (because the variance of the mean = δ2 / n, hence δ / √n), but I still don't understand what this math function means and does. We take the standard deviation of the data (if I correctly understand, we take the population's SD, not the sample's one) and divide it by the square root of the number of observations, thus, if I understand correctly the logic of this, computing how much of observations fit into one standard deviation. Why are we doing this and what does it tell us, mathematically and logically?

Dale said:
This is why the SE gets smaller as n gets larger. In this example a sample size of 4 gives an SE of 5
How did you get to SE = 5? If SD = 5, n = 4, then SE = 5/2 = 2.5. Have I incorrectly understood how to use the formula?

Thank you very much.
 
  • #4
Vital said:
How did you get to SE = 5? If SD = 5, n = 4, then SE = 5/2 = 2.5. Have I incorrectly understood how to use the formula?
I was continuing the same example so SD=10.

Vital said:
I was asking about how to interpret (read) the math we are performing in those formulas
I am not sure how to answer this. It seems that you already understand that part perfectly. To calculate a confidence interval for the sample mean you take the standard deviation, divide it by the square root of the sample size, multiply by the critical value from the distribution, and then both add and subtract it from the mean. That is it and you already seem to understand that. I will need more guidance on what you feel is missing because from my perspective it seems like you already understand that.

Vital said:
Why are we doing this and what does it tell us, mathematically and logically?
Continuing with my previous example, each time you select a random patient and measure their weight the answer you get will vary. Therefore, the weight is a random variable, and we would like to characterize it, eg with the mean and standard deviation.

Each time we select four random patients and take the mean of their weights then the answer we get will vary. Therefore, the sample mean is also a random variable with its own distribution, which we would also like to characterize. The standard error is the standard deviation of the distribution of the sample mean.
 
  • #5
Dale said:
I was continuing the same example so SD=10.

I am not sure how to answer this. It seems that you already understand that part perfectly. To calculate a confidence interval for the sample mean you take the standard deviation, divide it by the square root of the sample size, multiply by the critical value from the distribution, and then both add and subtract it from the mean. That is it and you already seem to understand that. I will need more guidance on what you feel is missing because from my perspective it seems like you already understand that.

Upon reading your answer few times, I think I now understand what I was asking about)) I will try to explain myself. I truly didn't understand the mathematical meaning of that computation and the purpose of dividing the population's standard deviation by the square root of the number of observations. But now I see to understand what we are doing here. Please, correct me if I am wrong. By computing the standard error we are finding the standard deviation of the sample mean, as you pointed, and this [s / √n] means that we are finding the mean of the population's standard deviation. Is this correct?

One more question here, if I may, about the value of t, which you named the critical value.
t0.025 = 2.262 for 95 percent interval. How to interpret this number? How is it computed? I took the value from the table. And what does it mean when we take the t and multiply it by the standard error - if the standard error is the averaged population's standard deviation as I assumed above?

Thank you very much.
 
  • #6
Vital said:
By computing the standard error we are finding the standard deviation of the sample mean, as you pointed, and this [s / √n] means that we are finding the mean of the population's standard deviation. Is this correct?
Not really. The SD of the mean is not the same as the mean of the SD. And generally a sample mean or SD will not be the same as a population mean or SD.

I think of it more like this. We have some random variable. My example was the weight of a patient, but it could be anything. That random variable has some statistical distribution, with an associated mean and SD. From that random variable we can generate new random variables in a number of ways. For example, if our patient weights, X, are normally distributed with mean 70 and SD 10 then we can compute a new random variable as Y=(X-70)/10. Now, suppose we want to know the distribution of this new random variable, it turns out that we can calculate the distribution of Y from the distribution of X, and it is also normally distributed, but with a mean of 0 and a SD of 1.

Now, we can also generate a new random variable from X as follows: ##Z=\Sigma_n X/n## where each of the terms in the sum is one of n random samples from X. Now, we ask what is the distribution of Z. It turns out that this too has a distribution that can be determined from X, and that is that Z is normally distributed with the same mean as X but with a SD of ##1/\sqrt{n}## times the SD of X.

Note, X and Z are different random variables. They are related to each other, but they are different things.
 
  • #8
It can be proven why that makes for a $$1-\alpha$$ confidence interval for X. This involves:
1) Proving that $$(n-1)s/n$$ is a chi-squared random variable (this is because s is a random variable where normal random variables are squared)
2) Proving $$\sqrt{n}X/s$$ is a t-student random variable (this can be easily proven using a result that relates normal and chi-squared random variables)
3) $$P(-a<\sqrt{n}X/s < a) = 1-\alpha => P(-as/\sqrt{n} < X < as/\sqrt{n}) = 1-\alpha => $$ X ± tα/2 x [s / √n] is an equal-tailed $$1-\alpha$$ confidence interval, since t-student is a symmetric distribution. This is just the basic sketch. This proof, along with many other ones, are usually skipped in basic statistics courses because they would require these courses to teach a lot more to understand the proofs properly.
 

What is a T-distribution?

A T-distribution is a probability distribution that is used to estimate the mean of a population when the sample size is small or when the population standard deviation is unknown. It is similar to the normal distribution, but has heavier tails, meaning it accounts for more extreme values in the data.

What is the standard error?

The standard error is a measure of the variability of a statistic, such as the mean, in a sample. It tells us how much the sample mean is likely to vary from the population mean. A smaller standard error indicates that the sample mean is a better estimate of the population mean.

How do I interpret the T-distribution?

To interpret the T-distribution, you need to look at the degrees of freedom, which is the number of values in the sample that are free to vary. As the degrees of freedom increase, the T-distribution becomes closer to the normal distribution. You can use a T-table or a statistical software to find the probability associated with a specific T-value.

What is the relationship between the T-distribution and the standard error?

The T-distribution is used to calculate the standard error, which is the standard deviation of the sampling distribution of a statistic. The standard error is calculated by dividing the standard deviation of the population by the square root of the sample size. It is an important measure because it tells us how precise our estimate of the population mean is.

How does sample size affect the T-distribution and the standard error?

A larger sample size results in a smaller standard error, meaning that the sample mean is a more precise estimate of the population mean. As the sample size increases, the T-distribution becomes closer to the normal distribution, and the probability associated with a specific T-value decreases. This is because a larger sample size provides more information about the population, reducing the uncertainty in our estimate.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
926
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
994
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
723
  • Set Theory, Logic, Probability, Statistics
Replies
22
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
727
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
662
  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
Back
Top