Proper understanding of p-value

fog37 · Aug 14, 2022

Hello,

I am still slightly confused about the meaning of the p-value. Here my current understanding:

There is a population. We don't know its parameters but we want to estimate them.
We collect a possibly large sample of size ##n## from it.
We formulate the hypotheses ##H0## and ##H1##, set a significance level ##\alpha##, and perform a hypothesis test to either fail to reject ##H0## or reject ##H0## in favor ##H1##.
The p-value is the probability, ASSUMING H0 is correct, of the calculated sample statistic.
A low p-value leads to rejecting H0: it means that the calculated sample statistic would have been really too rare, under the assumption that H0 is correct, for it actually happen. But it happened. The sample statistic, given its low p-value probability, is to be considered rare, but it happened. This means that it cannot be ascribed to just being a random fluke. Just sampling error would have not generated such a low probability statistic value. Something deeper must be going on. This leads us to believe that H0 is not so reliable.
The p-value is also called "the probability of chance" because it should be the value we would expect if only chance was at work, as it happens in random sampling. The fact that the sample statistic happened regardless of its low chances, must be attributed to something other than chance.

Is this correct?

The procedure above is based on analyzing a single, large sample. What if we repeated the procedure above with another simple random sample and this time the p-value was larger than the set threshold ##\alpha##? That would mean that we would fail to reject H0...So how many samples do we need to analyze to convince ourselves that ##H0## must be rejected or not?
It seems reasonable to explore multiple random samples and determine the p-value before drawing conclusions of what to do with H0.

THANK YOU!

Dale · Aug 14, 2022

fog37 said:

I am still slightly confused about the meaning of the p-value.

Yes, it is one of the most frequently misused and misunderstood statistics. That said, your understanding seems correct:

fog37 said:

The p-value is the probability, ASSUMING H0 is correct, of the calculated sample statistic.

The usual mistake, which you are not making, it to consider the p-value as the probability that ##H_0## is correct. Or even worse, to consider it as being related to the probability of ##H_1## in any way. In simple terms it is the probability of the data, given the null hypothesis.

fog37 said:

What if we repeated the procedure above with another simple random sample and this time the p-value was larger than the set threshold α?

In this very common case you would need to perform a correction for multiple comparisons. I like the Bonferroni Holm correction.

However, even with a multiple-comparisons correction, this is one of the big problems with frequentist statistics in science. When you perform that next experiment, you actually need to alter the p-value of the original experiment. In fact, ideally when reporting the original experiment you should have considered that you would repeat the experiment and you should have adjusted the original p-value in anticipation of the follow-up experiment. By simply intending to do follow-up experiments your p-value becomes weaker, and in the limit of a conscientious experimenter who intends to continue studying a topic indefinitely, any data can be made not statistically significant.

fog37 said:

So how many samples do we need to analyze to convince ourselves that H0 must be rejected or not?

That actually is less critical than to be explicit on your stopping criteria and use that stopping criteria in calculating your p-values. Once you have your defined stopping criteria then a power analysis using that experiment can guide you about the number of samples needed.

fog37 · Aug 14, 2022

Thanks Dale!

Glad I am on the right track. So, in general, unless we want to get into more sophisticated analysis and corrections, such as the Bonferroni Holm correction that you bring up, junior statisticians stick with analyzing one single, possibly large sample from the population...

Also, given that the starting assumption that ##H0## is correct, we are considering the value claimed to be true by ##H0## at the center of a probability distribution and the p-value is the probability value from such distribution at the corresponding z value.

In regards to such distribution, is it the theoretical and Gaussian sampling distribution for the statistic (say we are concerned with the sample mean) under study? We are essentially envisioning the sampling distribution of the means with the mean value proposed by ##H0## at the center of such Gaussian sampling distribution.

Is that correct?

Dale · Aug 14, 2022

fog37 said:

unless we want to get into more sophisticated analysis and corrections, such as the Bonferroni Holm correction that you bring up, junior statisticians stick with analyzing one single, possibly large sample from the population.

Yes

fog37 said:

Also, given that the starting assumption that H0 is correct, we are considering the value claimed to be true by H0 at the center of a probability distribution and the p-value is the probability value from such distribution at the corresponding z value.

In regards to such distribution, is it the theoretical and Gaussian sampling distribution for the statistic (say we are concerned with the sample mean) under study? We are essentially envisioning the sampling distribution of the means with the mean value proposed by H0 at the center of such Gaussian sampling distribution.

Is that correct?

Not necessarily. Your $H_0$ does not necessarily need to be at all related to the Gaussian distribution. That is a common approach , but not mandatory.

Stephen Tashi · Aug 16, 2022

fog37 said:

The p-value is the probability, ASSUMING H0 is correct, of the calculated sample statistic.

Assuming H0, the probability that the sample statistic takes on the value we observe is typically zero! - if we are talking about sample statistics that can take on a continuous range of values.

The p-value is, in general, the probability that the statistic lies in some interval. For example, the interval might be ##[0,\infty)##.

Justifying the use of a particular interval is a sophisticated intellectual exercise. For example, it's easy to explain the customary scenarios for using "one-tailed" vs "two-tailed" tests and the procedures are intuitively pleasing, but how do we prove that the methods are correct in any sense? The key to that is to define "correct" rigorously. This has to do with defining the "power" of statistical tests.

After all, since the p-value is, in general, the probability of an event that includes outcomes where the observed statistic did not have the value we observe, how do we justify including the probability of things that did not happen in making a decision?

fog37 · Aug 17, 2022

Since we are discussing statistics, my understanding is that statistical inference can be used to for 3 purposes:

a) find a yes/no answer about a parameter of an unknown population (that is hypothesis testing)
b) estimate the parameter(s) of an unknown population with a certain level of confidence (that is estimation)
c) predict the future (that is forecasting)

I am not sure about c)...How is inferential statistics used to predict the future? Are we assuming that the populations parameters can vary in the future and the idea is to predict them? Are talking about statistics in the context of time series, regression models as models to predict data that we currently don't have?

Also, I have been reading about probability vs statistics and some simplistically define them the inverse of each other...Every intro statistics book has a probability section. Is it because statistics employs the tools of probability to do statistical analysis? I guess...

Thanks

Proper understanding of p-value

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad A variant of the Monty Hall problem

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight