I T distribution

1. Nov 10, 2015

Josh S Thompson

So my understanding of the T distribution is that if you do not know the variance of a population you estimate the distribution of the mean with the T distribution. But I am not sure about this because if you know the variance of the population, law of large numbers shrinks the variance significantly. How can that be that just because the variance is chi-square RV it makes the spread of the distribution much bigger. I feel like I am missing something can someone please explain T distribution.

2. Nov 10, 2015

Stephen Tashi

You need to make your vocabulary more precise so that you distinguish between things like "mean" and "sample mean".

The T-distribution is the distribution for the T-statistic. A mean of a distribution would be a single number, the random variable that is the sample mean would have a distribution.

If you knew the variance, you'd know it. It wouldn't "shrink". It would be a single number.

3. Nov 10, 2015

Josh S Thompson

I thought t distribution was the distribution of the sample mean if the variance is unknown.

But how can you have a distribution of the sample mean if you do not know the population mean or the population variance

4. Nov 10, 2015

Josh S Thompson

I just don't really get why you say the sample mean is the population mean but then you can't do that for the population variance can you please explain the t distribution please

5. Nov 10, 2015

Josh S Thompson

But I don't understand if you use the t distribution to find the probability of the sample mean taking on some value why is the sample mean normally distributed

6. Nov 10, 2015

Stephen Tashi

Think of it this way: Suppose you want to graph the distribution of the sample mean of a sample of size N from a population that is normally distributed. The way you draw you graph depends on the population mean (which defines the central peak) and the population standard deviation and the value of N - they determine the "spread" of the graph. So to draw the graph, you have to specify 3 values.

Now suppose you want to graph the distribution of the t-statistic for the sample mean. The shape of this graph depends only on N.

In a typical real life problem, we take samples from an (assumed) normally distributed population without knowing the population mean or population standard deviation. If we use some procedure to estimate the population mean, it is very complicated to make a precise statement about the "quality" of the estimate. You can make a "confidence interval" type of statement about the sample mean using the normal distribution if you know ( or assume you know) the population standard deviation. You can make a "confidence interval" type of statement about the sample mean using the t-distribution without assuming you know the population standard deviation because the t-distribution uses a version of the sample standard deviation.

For large sample sizes, theory says that the sample standard deviation is probably a very good estimator of the population standard deviation so, for large sample sizes, people treat the sample standard deviation as if it were the population standard deviation and do calculations with the normal distribution. For small sample sizes, the t-distribution is used because it not safe to assume the sample standard deviation is probably near the population standard deviation.

Confidence intervals are a complicated topic and interpreting what they tell you is not simple.

7. Nov 11, 2015

Josh S Thompson

Ok thank you very much ;

I just don't really understand sample variance I thought it was supposed to be more accurate than x bar but you say they are both normal.

And one more question; if sigma is known the distribution of x bar approaches normal with no variation but if you do not know sigma the distribution of x bar approaches the normal distribution with variance equal to the sample variance?

8. Nov 11, 2015

Stephen Tashi

You have to be precise about what you mean by "more accurate".

From a given population, the distribution of the sample mean of a sample of 100 independent random samples has a higher peak that the distribution of the sample mean of 10 independent random samples.

I don't know what you mean by "approaches normal with no variation".

There are many different ways to talk about whether one distribution approaches another distribution. These have to do with various definitions for how the limit of a sequence of functions approaches another function.

I'll make this general suggestion. Try to be more precise in your statements. It won't be easy and it is particularly difficult to do when dealing with topics involving probability. Some people never master mathematical topics because they are too willing to think "I know what I'm talking about" when they utter statements about mathematical topic. Practice some self-doubt. Ask yourself "What exactly am I saying? Do I mean this...or that... ?".

9. Nov 11, 2015

Josh S Thompson

I just mean the variance approaches 0.

I'm saying shouldn't the sample variance be more closely approximated by the normal distribution because you take sums of standard normal RVs while for the mean it could be any distribution. And like why is it dumb to say that if the sample size is very large the distribution of the variance should get much much smaller. I don't get T distribution.

There is no way that x bar and S^2 are independent that is crazy.

10. Nov 12, 2015

Stephen Tashi

You still haven't stated a precise mathematical question.

I suggest you look at a textbook problem of how the t-distribution is used to state a confidence interval. Then ask yourself how you would state a confidence interval using a normal distribution. Would you do this by assuming the population variance is exactly equal to the variance that you estimated from the sample?

11. Nov 12, 2015

Staff: Mentor

No, the variance does not approach 0.

Let's say you have a population of 1 million adults and their heights are normally distributed with mean 6' and standard deviation 2". The adults do not all become exactly 6' tall just because you take a large sample.

12. Nov 2, 2016

Josh S Thompson

I mean the variance of the sampling distribution of the mean, when the variance of the population is known, would shrink, when you take samples of a large number.

And my question my question remains: why would the distribution of the sample mean with an unknown population be normal, while the distribution of the sample mean with a known variance be a single number when you take samples of like 10 billion?

Is the answer that the distributions of the two are nearly the same especially when you take large samples and the variance is not really a parameter in the t-distribution.

13. Nov 2, 2016

MarneMath

1. why would the distribution of the sample mean with an unknown population be normal? It doesn't have to be. The central limit theorem only applies when you take many samples. An individual sample mean doesn't have to be normal . The central limit theorem simply states that repeatedly sampling the sample means will tend to a normal distribution.

2.while the distribution of the sample mean with a known variance be a single number when you take samples of like 10 billion? Huh? Even if you know the variance of the population (unlikely) you still have a standard error.

14. Nov 2, 2016

Josh S Thompson

15. Nov 2, 2016

MarneMath

I think you really need to be more careful with how you use terms because i'm sensing a fundamental misunderstanding of what things mean and what they do with how you're writing. The t distribution requires that the parameters mu and sigma are from a normal a distribution which is applicable because of the central limit theorem. If you know a particular sample comes from a normal distribution and are not relying on the central limit theorem, then why are you using the t-distribution? You can simply pivot off the standard normal distribution since it is location-scale.

16. Nov 2, 2016

Josh S Thompson

You use t-distribution because it is more accurate than the normal distribution when the variance of the sample is random, which it always is, you would only use the normal distribution for x-bar if the variance was the same for all of your samples.

You don't know what the heck you're talking about.

17. Nov 2, 2016

Stephen Tashi

It isn't clear what you mean when you say "the variance was the same for all your samples".

Statistical terms such as "variance" are ambiguous. In typical scenario, there are several differnt things that can be called "variance". We have a population with a distribution and that distribution has a parameter called "variance". We have samples from the population and the mean values of those samples has a distribution and that distribution has a parameter called "variance". For a sample, we can compute the variance of the sample values about the sample mean. That computation defines a random variable called the "sample variance" (and it has a distribution with a parameter called "variance" of the sample variance). And we can have the specific realization of one numerical value (e.g. 23.65) of the sample variance that comes from one particular sample.

18. Nov 2, 2016

MarneMath

This goes back to my comment that you need to be more clear about what you're saying is that you need to know that your mu and sigma are from a normal distribution. If I gave you sample, a single sample, and you decided to use the T-distribution, that would be wrong. You have no idea if that single sample is exponential, gamma, beta etc, so that would violate the need for mean and variance to come from a normal distribution.

Secondly, it's unclear what variance you keep jumping too. Do you want to know the variance of the population, sample variance, variance of the mean variance of the variance? Then you talk about one sample, but then talk about the variance of samples. Part of mathematics is accurately writing what you want.

19. Nov 2, 2016

Josh S Thompson

I'm talking about the variance of your sample. Like you have a sample, some numbers, and you compute the variance of these numbers. There is no other way to make this clear.

Marne Math
I'm not talking about the clt, ok stop bringing up stuff about the central limit theorem.

20. Nov 2, 2016