Understand the T Distribution: What You Need to Know

  • I
  • Thread starter Josh S Thompson
  • Start date
  • Tags
    Distribution
In summary: The sample variance is a measure of how much the values in the sample vary from each other. It is more accurate than the x-bar statistic, but it is not perfect. In practice, it is usually good enough.
  • #1
Josh S Thompson
111
4
So my understanding of the T distribution is that if you do not know the variance of a population you estimate the distribution of the mean with the T distribution. But I am not sure about this because if you know the variance of the population, law of large numbers shrinks the variance significantly. How can that be that just because the variance is chi-square RV it makes the spread of the distribution much bigger. I feel like I am missing something can someone please explain T distribution.
 
Physics news on Phys.org
  • #2
You need to make your vocabulary more precise so that you distinguish between things like "mean" and "sample mean".

Josh S Thompson said:
So my understanding of the T distribution is that if you do not know the variance of a population you estimate the distribution of the mean with the T distribution.

The T-distribution is the distribution for the T-statistic. A mean of a distribution would be a single number, the random variable that is the sample mean would have a distribution.

But I am not sure about this because if you know the variance of the population, law of large numbers shrinks the variance significantly.

If you knew the variance, you'd know it. It wouldn't "shrink". It would be a single number.
 
  • Like
Likes Josh S Thompson
  • #3
I thought t distribution was the distribution of the sample mean if the variance is unknown.

But how can you have a distribution of the sample mean if you do not know the population mean or the population variance
 
  • #4
I just don't really get why you say the sample mean is the population mean but then you can't do that for the population variance can you please explain the t distribution please
 
  • #5
But I don't understand if you use the t distribution to find the probability of the sample mean taking on some value why is the sample mean normally distributed
 
  • #6
Josh S Thompson said:
But I don't understand if you use the t distribution to find the probability of the sample mean taking on some value why is the sample mean normally distributed

Think of it this way: Suppose you want to graph the distribution of the sample mean of a sample of size N from a population that is normally distributed. The way you draw you graph depends on the population mean (which defines the central peak) and the population standard deviation and the value of N - they determine the "spread" of the graph. So to draw the graph, you have to specify 3 values.

Now suppose you want to graph the distribution of the t-statistic for the sample mean. The shape of this graph depends only on N.

In a typical real life problem, we take samples from an (assumed) normally distributed population without knowing the population mean or population standard deviation. If we use some procedure to estimate the population mean, it is very complicated to make a precise statement about the "quality" of the estimate. You can make a "confidence interval" type of statement about the sample mean using the normal distribution if you know ( or assume you know) the population standard deviation. You can make a "confidence interval" type of statement about the sample mean using the t-distribution without assuming you know the population standard deviation because the t-distribution uses a version of the sample standard deviation.

For large sample sizes, theory says that the sample standard deviation is probably a very good estimator of the population standard deviation so, for large sample sizes, people treat the sample standard deviation as if it were the population standard deviation and do calculations with the normal distribution. For small sample sizes, the t-distribution is used because it not safe to assume the sample standard deviation is probably near the population standard deviation.

Confidence intervals are a complicated topic and interpreting what they tell you is not simple.
 
  • #7
Ok thank you very much ;

Stephen Tashi said:
You can make a "confidence interval" type of statement about the sample mean using the normal distribution if you know ( or assume you know) the population standard deviation. You can make a "confidence interval" type of statement about the sample mean using the t-distribution without assuming you know the population standard deviation because the t-distribution uses a version of the sample standard deviation.

I just don't really understand sample variance I thought it was supposed to be more accurate than x bar but you say they are both normal.

And one more question; if sigma is known the distribution of x bar approaches normal with no variation but if you do not know sigma the distribution of x bar approaches the normal distribution with variance equal to the sample variance?
 
  • #8
Josh S Thompson said:
I just don't really understand sample variance I thought it was supposed to be more accurate than x bar but you say they are both normal.

You have to be precise about what you mean by "more accurate".

From a given population, the distribution of the sample mean of a sample of 100 independent random samples has a higher peak that the distribution of the sample mean of 10 independent random samples.

And one more question; if sigma is known the distribution of x bar approaches normal with no variation

I don't know what you mean by "approaches normal with no variation".

but if you do not know sigma the distribution of x bar approaches the normal distribution with variance equal to the sample variance?

There are many different ways to talk about whether one distribution approaches another distribution. These have to do with various definitions for how the limit of a sequence of functions approaches another function.

I'll make this general suggestion. Try to be more precise in your statements. It won't be easy and it is particularly difficult to do when dealing with topics involving probability. Some people never master mathematical topics because they are too willing to think "I know what I'm talking about" when they utter statements about mathematical topic. Practice some self-doubt. Ask yourself "What exactly am I saying? Do I mean this...or that... ?".
 
  • #9
Stephen Tashi said:
I don't know what you mean by "approaches normal with no variation".
I just mean the variance approaches 0.

Stephen Tashi said:
You have to be precise about what you mean by "more accurate".
I'm saying shouldn't the sample variance be more closely approximated by the normal distribution because you take sums of standard normal RVs while for the mean it could be any distribution. And like why is it dumb to say that if the sample size is very large the distribution of the variance should get much much smaller. I don't get T distribution.

There is no way that x bar and S^2 are independent that is crazy.
 
  • #10
You still haven't stated a precise mathematical question.

I suggest you look at a textbook problem of how the t-distribution is used to state a confidence interval. Then ask yourself how you would state a confidence interval using a normal distribution. Would you do this by assuming the population variance is exactly equal to the variance that you estimated from the sample?
 
  • #11
Josh S Thompson said:
I just mean the variance approaches 0.
No, the variance does not approach 0.

Let's say you have a population of 1 million adults and their heights are normally distributed with mean 6' and standard deviation 2". The adults do not all become exactly 6' tall just because you take a large sample.
 
  • #12
Dale said:
No, the variance does not approach 0.

Let's say you have a population of 1 million adults and their heights are normally distributed with mean 6' and standard deviation 2". The adults do not all become exactly 6' tall just because you take a large sample.

I mean the variance of the sampling distribution of the mean, when the variance of the population is known, would shrink, when you take samples of a large number.

And my question my question remains: why would the distribution of the sample mean with an unknown population be normal, while the distribution of the sample mean with a known variance be a single number when you take samples of like 10 billion?

Is the answer that the distributions of the two are nearly the same especially when you take large samples and the variance is not really a parameter in the t-distribution.
 
  • #13
1. why would the distribution of the sample mean with an unknown population be normal? It doesn't have to be. The central limit theorem only applies when you take many samples. An individual sample mean doesn't have to be normal . The central limit theorem simply states that repeatedly sampling the sample means will tend to a normal distribution.

2.while the distribution of the sample mean with a known variance be a single number when you take samples of like 10 billion? Huh? Even if you know the variance of the population (unlikely) you still have a standard error.
 
  • #14
MarneMath said:
1. why would the distribution of the sample mean with an unknown population be normal? It doesn't have to be. The central limit theorem only applies when you take many samples. An individual sample mean doesn't have to be normal . The central limit theorem simply states that repeatedly sampling the sample means will tend to a normal distribution.

I'm not talking about the central limit theorem I'm talking about the t-distribution which already assumes the population is normally distributed. Therefore you don't have the noise that the sampling distribution of an un-normal population distribution would have.
 
  • #15
I think you really need to be more careful with how you use terms because I'm sensing a fundamental misunderstanding of what things mean and what they do with how you're writing. The t distribution requires that the parameters mu and sigma are from a normal a distribution which is applicable because of the central limit theorem. If you know a particular sample comes from a normal distribution and are not relying on the central limit theorem, then why are you using the t-distribution? You can simply pivot off the standard normal distribution since it is location-scale.
 
  • #16
MarneMath said:
I think you really need to be more careful with how you use terms because I'm sensing a fundamental misunderstanding of what things mean and what they do with how you're writing. The t distribution requires that the parameters mu and sigma are from a normal a distribution which is applicable because of the central limit theorem.

What the heck does this mean. The t-distribution has 1 parameter, mu and sigma are THE parameters of the normal distribution, they aren't from the normal distribution. The t-distribution requires that x-bar be from a normal distribution which only happens if x is normally distributed with parameters mu and sigma.

If you know a particular sample comes from a normal distribution and are not relying on the central limit theorem, then why are you using the t-distribution? You can simply pivot off the standard normal distribution since it is location-scale.

You use t-distribution because it is more accurate than the normal distribution when the variance of the sample is random, which it always is, you would only use the normal distribution for x-bar if the variance was the same for all of your samples.

You don't know what the heck you're talking about.
 
  • #17
Josh S Thompson said:
you would only use the normal distribution for x-bar if the variance was the same for all of your samples.

It isn't clear what you mean when you say "the variance was the same for all your samples".

Statistical terms such as "variance" are ambiguous. In typical scenario, there are several differnt things that can be called "variance". We have a population with a distribution and that distribution has a parameter called "variance". We have samples from the population and the mean values of those samples has a distribution and that distribution has a parameter called "variance". For a sample, we can compute the variance of the sample values about the sample mean. That computation defines a random variable called the "sample variance" (and it has a distribution with a parameter called "variance" of the sample variance). And we can have the specific realization of one numerical value (e.g. 23.65) of the sample variance that comes from one particular sample.
 
  • #18
This goes back to my comment that you need to be more clear about what you're saying is that you need to know that your mu and sigma are from a normal distribution. If I gave you sample, a single sample, and you decided to use the T-distribution, that would be wrong. You have no idea if that single sample is exponential, gamma, beta etc, so that would violate the need for mean and variance to come from a normal distribution.

Secondly, it's unclear what variance you keep jumping too. Do you want to know the variance of the population, sample variance, variance of the mean variance of the variance? Then you talk about one sample, but then talk about the variance of samples. Part of mathematics is accurately writing what you want.
 
  • #19
I'm talking about the variance of your sample. Like you have a sample, some numbers, and you compute the variance of these numbers. There is no other way to make this clear.

Marne Math
I'm not talking about the clt, ok stop bringing up stuff about the central limit theorem.
 
  • #20
Thread closed for Moderation, obviously...

Edit: a post has been deleted and the thread will remain closed
 
Last edited by a moderator:

1. What is a t-distribution and how is it different from a normal distribution?

A t-distribution is a probability distribution that is used to estimate the population mean when the sample size is small or when the population standard deviation is unknown. It is similar to a normal distribution in that it is bell-shaped, but it has heavier tails, meaning it has a higher probability of extreme values. This makes it more suitable for small sample sizes because it takes into account the uncertainty of estimating the population mean.

2. How is the t-distribution related to the t-test?

The t-distribution is the underlying distribution used in the t-test, which is a statistical test used to determine whether there is a significant difference between the means of two groups. The t-test calculates a t-value, which is then compared to a critical value from the t-distribution to determine the statistical significance of the results.

3. What are degrees of freedom in the t-distribution?

Degrees of freedom in the t-distribution represent the number of independent pieces of information that are used to calculate a statistic. In the t-distribution, degrees of freedom are equal to the sample size minus one. As the degrees of freedom increase, the t-distribution becomes closer to a normal distribution.

4. How do you interpret the t-value in a t-distribution?

The t-value in a t-distribution represents the distance between the sample mean and the population mean in terms of standard error. A larger t-value indicates a greater difference between the sample mean and the population mean, and a smaller p-value indicates a higher likelihood of this difference being statistically significant.

5. When should I use a t-distribution instead of a normal distribution?

A t-distribution should be used when the sample size is small (typically less than 30) or when the population standard deviation is unknown. In these cases, using a normal distribution can lead to inaccurate results because it assumes a larger sample size and a known population standard deviation.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
310
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
280
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
769
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
17
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
Back
Top