Understanding Chi Squared p-values

Isaac0427 · Jan 23, 2018

Hi,

I am in AP Biology, and I completely understand how to use chi squared to the level the AP exam and class requires, but I do not completely understand why p-values work the way they do. I have done a lot of research, and I think I may have an idea as to how this works. Can you please tell me if I am correct or not, and if not why? Thank you in advance!

For this, consider we go to a remote island, and we look at right- and left-handedness. Our null hypothesis is that there is no natural favor between being right- and left-handed, i.e. the distribution is 50/50. The alternative hypothesis is thus that nature favors either right- or left-handedness. We calculate a chi squared and then find a p-value. Is this how we would interpret our results?:
Chance that there is no favor: p
Chance that there is a favor: 1-p

If we use the .05 threshold and we get a p-value of .06, there is only a 6% chance the null hypothesis is correct, but we still accept it because we would rather say there is no favor if there is one than say there is one if there isn't, right? If we wanted the other way around, i.e. to rather say there is a favor when there is not than say there is no favor when there is, we would use the left-tailed test (p=0.95 threshold), right?

Stephen Tashi · Jan 23, 2018

Isaac0427 said:

We calculate a chi squared and then find a p-value. Is this how we would interpret our results?:
Chance that there is no favor: p
Chance that there is a favor: 1-p

A p value gives the probability that some set of outcomes happens on the assumption that the "null hypothesis" is correct. It does not say anything about the probability that the null hypothesis is correct and, likewise, 1-p cannot be interpreted as the probability that the null hypothesis is false. So both interpretations you mention are incorrect.

( I assume you understand that the conditional probabilities Pr(H|D) and Pr(D|H) need not be equal. The p-values concern Pr(the data happened | null hypothesis true). Your interpretations concern Pr(null hypothesis true | the data happened).)

It is human nature to think: If the assumption of the null hypothesis implies the probability of the data happening is small then the the null hypothesis is "probably" wrong. However, this statement cannot be mathematically proven or quantified without making more assumptions that are made in traditional hypothesis testing.

Hypothesis testing is a procedure that has been found empirically to be useful. It isn't a procedure that can be mathematically proven to get the right answer with some known probability.

A Bayesian approach to hypothesis testing can answer "What is the probability that the null hypothesis is true given the data happened", but it involves making more assumptions than are made in traditional ("frequentist") hypothesis testing.

Isaac0427 · Jan 23, 2018

Stephen Tashi said:

( I assume you understand that the conditional probabilities Pr(H|D) and Pr(D|H) need not be equal. The p-values concern Pr(the data happened | null hypothesis true). Your interpretations concern Pr(null hypothesis true | the data happened).)

A video I watched from bozeman science says that we can interpret .05 as a 95% confidence that we are rejecting an incorrect null hypothesis. Is this incorrect?

StoneTemplePython · Jan 23, 2018

Interpreting p-values correctly is fraught with problems...

You may enjoy working through Statistics Done Wrong, in particular this page:

https://www.statisticsdonewrong.com/data-analysis.html#the-power-of-p-values

Stephen Tashi · Jan 23, 2018

Isaac0427 said:

A video I watched from bozeman science says that we can interpret .05 as a 95% confidence that we are rejecting an incorrect null hypothesis. Is this incorrect?

No. [Edit: I mean "Yes, it is incorrect"]

Any calculation you do in frequentist hypothesis is done by assuming the probability distribution implied by the null hypothesis is correct. So all probabilities you calculate are of the form Pr(...something...| null hypothesis is true). You can compute the probability of (incorrectly) rejecting the null hypothesis given the null hypothesis is true. You can't compute the probability of rejecting the null hypothesis (or doing anything else) when the null hypothesis is false.

In frequentist statistics, analysis of what happens when the null hypothesis is false is done by "power curves" and discussions of the "power" of statistical tests. This involves assuming specific ways in which the null hypothesis can be false. (E.g. You can't compute specific probabilities of seeing a certain number of right handers on the vague assumption "right handers are more probable than left handers". To do that, you must make a specific assumption such as "the probability of a person being right handed is 0.64.)

"Confidence" has a technical definition in statistics. In particular "95% confidence about such-and-such a statement" is not the same as saying that there is .95 probability that the statement is true. To evaluate what bozeman science says, we'd have see an exact quote about what is claimed.

Dale · Jan 23, 2018

Isaac0427 said:

A video I watched from bozeman science says that we can interpret .05 as a 95% confidence that we are rejecting an incorrect null hypothesis. Is this incorrect?

As you wrote it it is indeed incorrect. A p value of is the probability of the observed data assuming that the null hypothesis is true: P(data|H0 is true). If we make a decision in advance to reject H0 iff p<0.05 then: P(reject H0|H0 is true)=0.05. It cannot and does not tell you anything about P(X|H0 is false)

Isaac0427 · Jan 23, 2018

In the video, Mr. Anderson says that you can interpret .05 as “you are 95% sure you are [correctly] accepting or rejecting the null hypothesis.”

I guess what is also confusing is that .05 seems like a really low threshold to accept a hypothesis. How can you be certain about a hypothesis if 95% if the data you would get under that hypothesis would be better than the data you have?

Dale · Jan 23, 2018

No, he is being very sloppy with his wording. For one thing, you do not “accept” a null hypothesis, you only “fail to reject” a null hypothesis. He seems like an engaging lecturer, but he has clearly decided to trade precision for engagement.

To be fair, he is describing how many people think about statistics. But he is wrong in a very common way.

Isaac0427 said:

I guess what is also confusing is that .05 seems like a really low threshold to accept a hypothesis.

Again, you never accept the null hypothesis in this way. You only fail to reject it.

Isaac0427 · Jan 23, 2018

Dale said:

To be fair, he is describing how many people think about statistics. But he is wrong in a very common way.

Well, I guess that confuses me. And I brought up to my AP Bio teacher the question of how critical values go up when you want more certainty, which seems counter-intuitive. He was not sure. We both assumed that Mr. Anderson was completely correct, as the comments were not filled with corrections. So, is there some digit that gives you how certain you are for a given p-value?

Also, if you cannot accept the null hypothesis, what would you use the left-tailed test for?

Dale · Jan 23, 2018

Isaac0427 said:

So, is there some digit that gives you how certain you are for a given p-value?

The question isn’t how certain you are. It is what are you certain about. A low p value makes you certain that the data is not likely under the null hypothesis. In other words you reject the null hypothesis as a likely explanation for the data because it would be so unlikely to see that data if it were true.

Isaac0427 said:

Also, if you cannot accept the null hypothesis, what would you use the left-tailed test for?

I have been doing statistics since the early 90’s and I have never used a left tailed chi squared test.

Isaac0427 · Jan 23, 2018

Dale said:

I have been doing statistics since the early 90’s and I have never used a left tailed chi squared test.

Critical value tables have p values of .9, .95, .99, etc. What are those used for?

Stephen Tashi · Jan 23, 2018

Isaac0427 said:

So, is there some digit that gives you how certain you are for a given p-value?

What exactly is your question? To understand statistics, you need to develop a precise vocabular.y. If you are asking a question about a probability, phrase it using the word "probability" and specify the event associated with the probability. Don't phrase questions about a probability using words like "certain" or "confident".

Also, if you cannot accept the null hypothesis, what would you use the left-tailed test for?

This is another vocabulary technicality. If a company makes decision based only a statistical hypothesis test then it will either proceed on the assumption the null hypothesis is true or proceed on the assumption the null hypothesis is false. So, in common language, the company has either accepted or rejected the null hypothesis. However, some statisticians don't like to use the word "accept" when talking about the null hypothesis. This is because all the calculations in a hypothesis test are done on the assumption that the null hypothesis is true. So it would be redundant to say we "accept" the null hypothesis. (The phrase "reject the null hypothesis" has the interesting logical problem that if we do calculations base on the assumption the null hypothesis is true and then reject it, this implies our calculations are no longer justified. Nevertheless, most statisticians are comfortable using that phrase. )

Dale · Jan 23, 2018

Isaac0427 said:

Critical value tables have p values of .9, .95, .99, etc. What are those used for?

Nothing that I have ever seen

Isaac0427 · Jan 23, 2018

Ok, thank you guys. This Bozeman video seems to have some very key details incorrect, as the whole video is about accepting the null hypothesis. Thank you for all help.

Dale · Jan 23, 2018

Stephen Tashi said:

The phrase "reject the null hypothesis" has the interesting logical problem that if we do calculations base on the assumption the null hypothesis is true and then reject it, this implies our calculations are no longer justified.

I don’t think this is a problem. It is the typical approach of assuming X, finding a contradiction, and therefore concluding not-X. It is a standard logical approach for many proofs. The only difference is that it is not a contradiction, but just an incongruity.

FactChecker · Jan 24, 2018

Notice that the term "confidence" is not formally defined as a probability. That is intentional. Consider your original example. You can say "If there is no favor between right and left-handedness, then we have a probability less than 1-p of getting the results we got (or worse)." But you can not turn that statement around and say anything about the probability of "no favor between right and left-handedness". It is true that one should be skeptical of the null hypothesis if the data obtained is extremely unlikely for that assumption. But we can not directly assign a probability number to that skepticism. So terms like "skepticism" or "confidence" should not be thought of directly as probabilities.

Stephen Tashi · Jan 24, 2018

Dale said:

I don’t think this is a problem. It is the typical approach of assuming X, finding a contradiction, and therefore concluding not-X. It is a standard logical approach for many proofs. The only difference is that it is not a contradiction, but just an incongruity.

I agree with the analogy to indirect proof. The convoluted aspect is justifying what we consider an incongruity. Typically, the data gives a single value for a statistic. So how do we explain why we look at probabilities for that single value being in a larger "acceptance region". And how do we explain the (intuitively obvious) choice of the rejection region as one or both tails of the distribution for the statistic? After all, if we picked any aribitrary set of intervals with a total probability of p for the acceptance region then the probability of incorrectly rejecting the null hypothesis when the statistic falls outside of those intervals is 1-p.

I think answers to those questions (in frequentist statistics) must explained by arguments about the power of various tests. The concept of the power of test involves considering specific ways the null hypothesis can be false.

It's no wonder that people trying to explain introductory statistics make oversimplifications.

Dale · Jan 24, 2018

Stephen Tashi said:

After all, if we picked any aribitrary set of intervals with a total probability of p for the acceptance region then the probability of incorrectly rejecting the null hypothesis when the statistic falls outside of those intervals is 1-p.

Woah! I hadn’t considered that. Very nice.

On a related note, a null hypothesis is inherently kind of silly. I mean, it is almost always much more reasonable to assume that it is false, and ask if it is a good approximation instead.

Understanding Chi Squared p-values

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect