Desirable estimator properties...

  • I
  • Thread starter fog37
  • Start date
  • Tags
    Linear
  • #1
fog37
1,568
108
TL;DR Summary
estimators and their properties and relation to hypothesis testing
Hello,
An estimator is a random variable, i.e. a function that assigns a number to a random sample collected from a population with unknown parameters. More practically, an estimator is really a formula to calculate an estimated coefficient ##b## using the data from our single random sample...Some estimators are linear, if they are linear functions of the dependent variable ##Y##, some are nonlinear.

We want our estimator to be unbiased: collecting many many many samples, we calculate their many estimates and the arithmetic average of those estimate is equal to the true population parameter. That is really good even if our estimate is not exactly equal to the population parameter...Knowing that we used an unbiased estimator gives up more confidence on the "quality" of our estimate....

In the class of linear unbiased estimators, OLS estimators have minimum variance as long as the Gauss-Markov assumptions are met. That leads to a small standard error of the sampling distribution of the estimates and to tighter confidence interval, hence the least possible uncertainty. An unbiased estimator with the least variance is called an efficient estimator. Under the assumed conditions, OLS estimators are BLUE (best, linear, unbiased, estimators).

If, in addition to the GM assumptions we can require the errors to be normally distributed, the OLS estimators become themselves normally distributed and we can safely use hypothesis testing... With the added assumption of normality, the OLS estimators are best unbiased estimators (BUE) in the entire class of unbiased estimators among all linear and all nonlinear estimators...That is a big result because, without the normality assumptions, the OLS estimators are only best among the linear estimators.

That said, here my dilemmas:

Question: we only work with a single random sample of size ##n## and use fancy math to indirectly learn about the sampling distribution of the estimates without collecting millions of samples. If the estimators are BLUE, it essentially means that, even if our sample estimate is not really exactly equal to the population parameter, it is a "good quality" estimate. An estimator being BLUE gives us confidence that our estimation procedure is good and produces statistically reliable estimates, correct?

Question: what do we gain from hypothesis testing? Do we get to verify, based on our limited data, if our estimate is statistically sound? For example, in linear regression, we test if the slope is equal to zero against what we get from our single sample calculation.

For example, image we get an estimate from a BLUE estimator. That is good because our estimate ##b## is close to the real, true population value ##\beta## and on average, becomes equal to it! What if our hypothesis test, because of lack of error normality, gives us statistically not significant results? Does that make our good, unbiased, low variance estimate fall apart since the results we get from the tests tell us that our sample estimate don't really reflect what is going on in the population?

I am confused about what it means when getting an estimate from a BLUE estimator but the hypothesis test support ##H_0## that the estimated slope is zero....

Thank you!
 
Physics news on Phys.org
  • #2
Estimation and hypothesis testing are very different topics. In estimation you are trying to determine the “true” value of a quantity measured on the population from the value of a quantity measured on a sample. In hypothesis testing you are comparing the data to one or more models
 
  • Like
Likes fog37 and FactChecker
  • #3
Dale said:
Estimation and hypothesis testing are very different topics. In estimation you are trying to determine the “true” value of a quantity measured on the population from the value of a quantity measured on a sample. In hypothesis testing you are comparing the data to one or more models
Yes, but imagine that my estimation of the parameter to be very good and close to the true value, some nonzero value. And the hypothesis test being such that ##H_0## is not rejected with ##H_0## stating that the parameter is 0...
Aren't the outcome of my estimation process and the outcome of the hypothesis test contradicting each other?
 
  • #4
Not really. They are saying different things, and in fact this happens regularly in science. The estimator says that the value of the population parameter is probably close to some value or in some range. The hypothesis test says that the data would be likely to have occurred if the parameter were 0. The estimator tells you about the population, the hypothesis test tells you about the data.

Suppose that you have a real, non-zero effect, but that it is rather small compared to the variance in the population. Then modest sample sizes would often be consistent with the null hypothesis.

In science you should talk about both. A highly significant result may be unimportant if the effect size is small
 
  • Like
Likes fog37
  • #5
Dale said:
Not really. They are saying different things, and in fact this happens regularly in science. The estimator says that the value of the population parameter is probably close to some value or in some range. The hypothesis test says that the data would be likely to have occurred if the parameter were 0. The estimator tells you about the population, the hypothesis test tells you about the data.

Suppose that you have a real, non-zero effect, but that it is rather small compared to the variance in the population. Then modest sample sizes would often be consistent with the null hypothesis.

In science you should talk about both. A highly significant result may be unimportant if the effect size is small
Thank you. Let me see if I can make my doubt more clear. Let's assume that the population linear regression equation has a slope equal to 5.3. We collect a random sample and OLS gives us a close point estimate of 4.9...
Then we run the hypothesis test for the slope where ##H_0## is stating that the slope is zero and our data is such that we cannot reject ##H_0##. That means that the slope we got is statistically not significant, the p-value will high...that would also mean that our sample provides evidence that the actual population slope should be zero...but in reality it is 5.3! And our good point estimation supports that....

That is why the scenario seems to pin the the outcome of estimation and the outcome of the test against each other...The hypothesis test makes claims about the population parameters and we use the sample to check those claims...
 
  • #6
fog37 said:
that would also mean that our sample provides evidence that the actual population slope should be zero.
No. Failure to reject the null hypothesis is not taken as evidence that the null hypothesis is true.

fog37 said:
The hypothesis test makes claims about the population parameters and we use the sample to check those claims
This is an unfortunate misconception. Hypothesis testing is a really bizarre thing, not in terms of the statistics but in terms of the way the scientific community misuses it.

A hypothesis test does not make any claim about population parameters. The outcome of a hypothesis test is a p value. The p value is the probability of the data given the null hypothesis. It is a claim about the data, not about any population parameter. The data can be very likely under the null hypothesis even when the population parameter is not zero.

The strange thing is that if this probability about the data is small then the null hypothesis is rejected, which for some completely nonsensical reason is then taken as evidence supporting a non-zero parameter value. Despite the fact that the non-zero parameter was never tested!

This is the principal benefit of Bayesian statistics: the outcomes match what people mistakenly think they are doing with standard hypothesis tests, and the values are what people actually want to obtain from their statistical methods.
 
  • Like
Likes fog37
  • #7
Dale said:
No. Failure to reject the null hypothesis is not taken as evidence that the null hypothesis is true.

This is an unfortunate misconception. Hypothesis testing is a really bizarre thing, not in terms of the statistics but in terms of the way the scientific community misuses it.

A hypothesis test does not make any claim about population parameters. The outcome of a hypothesis test is a p value. The p value is the probability of the data given the null hypothesis. It is a claim about the data, not about any population parameter. The data can be very likely under the null hypothesis even when the population parameter is not zero.

The strange thing is that if this probability about the data is small then the null hypothesis is rejected, which for some completely nonsensical reason is then taken as evidence supporting a non-zero parameter value. Despite the fact that the non-zero parameter was never tested!

This is the principal benefit of Bayesian statistics: the outcomes match what people mistakenly think they are doing with standard hypothesis tests, and the values are what people actually want to obtain from their statistical methods.
Well, thank you, I am one of those that has clearly misunderstood. My belief has been that we perform a hypothesis test because we believe that the population has a certain parameter and we want to use the sample data to confirm that or as evidence against that. For example, if a company makes cookies which are believed to have ##10 g## of sugar, that would lead to ##H_0=10 g##...We then do calculations with our single sample and if the sample mean is very very unlikely, given the premise ##H_0: \mu=10 g##, then we reject the current belief (we may be making a mistake though) and go for the alternative hypothesis ##H_1: \mu \neq 10 g##. So, to me, Hypothesis testing has been about hypothesizing something about the population and checking how true that can be using a single random sample....
 
  • #8
fog37 said:
My belief has been that we perform a hypothesis test because we believe that the population has a certain parameter and we want to use the sample data to confirm that or as evidence against that.
Yes, that is common. That is the typical motivation why we perform a hypothesis test. But that is not what a null hypothesis significance test actually does. In statistical language, your motivation for performing the test is that you want to know ##P(H|D)## (the probability of your actual hypothesis, given the data), but the test actually gives you ##P(D|H_0)## (the probability of the data, given the null hypothesis).

If you are familiar with woodworking the analogy I like to use is a planer and a jointer. A jointer makes one face of a board flat. A planer makes two faces parallel. You can fiddle around with your setup and if you are very careful and very patient you can use a planer to flatten a face. It is not what the tool itself does nor what it is designed to do, but a skillful user can make it function in that way.

The difference is that the woodworking community understands its tools very well, and understands what they actually do, and they understand the care and setup necessary to make a planer do the job of a jointer. The scientific community just uses our "planer" with little understanding of what it does and without any particular care to get it to do the job of our "jointer" which is what we actually want.

My complaint about that is that we now have a statistical tool that actually does the scientific job that we want done. This is a relatively new tool, meaning that it has been available for the last few decades instead of the last couple of centuries.

fog37 said:
For example, if a company makes cookies which are believed to have 10g of sugar, that would lead to H0=10g...We then do calculations with our single sample and if the sample mean is very very unlikely, given the premise H0:μ=10g, then we reject the current belief (we may be making a mistake though) and go for the alternative hypothesis H1:μ≠10g. So, to me, Hypothesis testing has been about hypothesizing something about the population and checking how true that can be using a single random sample....
Notice, in this example, if your null hypothesis significance test rejects the null hypothesis, all you know is that ##P(D|H_0)## is small. You do not know anything about the value of the population parameter.

You can use an estimator to estimate the value of the parameter, but that is separate and not part of the hypothesis test. Suppose that your estimator gives you an estimate of 11 g. In particular, the significance test does not give you any information whatsoever about ##P(H=11|D)## nor even any information about ##P(D|H=11)##.
 

What are the key desirable properties of an estimator in statistics?

In statistics, the most desirable properties of an estimator include unbiasedness, consistency, efficiency, and sufficiency. An unbiased estimator has an expected value equal to the true parameter it estimates. Consistency means that as the sample size increases, the estimator converges in probability to the true parameter value. Efficiency refers to an estimator having the smallest possible variance among all unbiased estimators. Sufficiency implies that the estimator captures all necessary information from the sample data about the parameter.

Why is unbiasedness important in an estimator?

Unbiasedness is crucial because it ensures that the expected value of the estimator matches the true parameter value. This property is important for making accurate and reliable statistical inferences, as it implies that on average, the estimator does not overestimate or underestimate the parameter. Unbiased estimators provide a correct centering of the distribution of the estimates around the true parameter value, which is vital for the validity of subsequent statistical analyses and conclusions.

What does it mean for an estimator to be consistent?

A consistent estimator is one that converges in probability to the true value of the parameter as the sample size increases indefinitely. This means that the estimator becomes more accurate as more data points are used in the calculation. Consistency is a critical property because it guarantees that by increasing the sample size, the estimator's accuracy and reliability improve, ultimately leading to more precise estimations of the parameter.

How does efficiency relate to other properties of estimators?

Efficiency in the context of estimators refers to an estimator's variance being as small as possible, given that it is unbiased. An efficient estimator is the best unbiased estimator with the minimum variance among all unbiased estimators for a parameter. This property is crucial when comparing two unbiased estimators, as the more efficient estimator provides more precise estimates using the same amount of data. Efficiency is particularly important in situations with limited data, where maximizing the information gained from each data point is critical.

What is the significance of an estimator being sufficient?

An estimator is considered sufficient if it summarizes all the information in the sample that is needed to estimate the parameter effectively. This means that no other statistic that can be calculated from the same sample provides any additional information about the parameter. Sufficiency reduces the data in a way that no information about the parameter is lost. This property helps in simplifying the analysis without sacrificing the quality of the information about the parameter being estimated.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
443
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
486
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
841
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
662
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
Back
Top