Are Sampling Issues the Key to Proving the Existence of Hot Hands?

BWV · Apr 1, 2019

Authors revisited the famous Gilovich, Vallone, and Tversky paper from 1985 that 'disproved' the existence of a 'hot hand' or serial correlation in shooting attempts and found a subtle sampling issue, that when corrected for, actually proves that hot hands exist

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2627354

We prove that a subtle but substantial bias exists in a standard measure of the conditional dependence of present outcomes on streaks of past outcomes in sequential data. The magnitude of this novel form of selection bias generally decreases as the sequence gets longer, but increases in streak length, and remains substantial for a range of sequence lengths often used in empirical work. The bias has important implications for the literature that investigates incorrect beliefs in sequential decision making---most notably the Hot Hand Fallacy and the Gambler's Fallacy. Upon correcting for the bias, the conclusions of prominent studies in the hot hand fallacy literature are reversed. The bias also provides a novel structural explanation for how belief in the law of small numbers can persist in the face of experience.

PeterDonis · Apr 1, 2019

I see what looks like a simple-minded error in the first simple example (3 coin flips). Table I lists the eight possible sequences. The "recorded flips" are every flip that follows a flip that came up heads. The question to be answered is, what fraction of recorded flips are heads?

The answer the paper gives is 5/12. But this is wrong. There are eight recorded flips (the underlined ones in Table I), and four of them are heads. So the fraction is 1/2, just as we would expect.

I haven't read the rest of the paper yet, but if it uses the same type of logic, I would say its conclusion is in error.

PeterDonis · Apr 1, 2019

BWV said:

Authors revisited the famous Gilovich, Vallone, and Tversky paper from 1985 that 'disproved' the existence of a 'hot hand' or serial correlation in shooting attempts and found a subtle sampling issue, that when corrected for, actually proves that hot hands exist

It's worth noting that the paper's actual conclusion is not that "hot hands" exist, but that "cold hands" exist--i.e., that the probability of success after a streak of successes is smaller, not larger.

BWV · Apr 1, 2019

PeterDonis said:

I see what looks like a simple-minded error in the first simple example (3 coin flips). Table I lists the eight possible sequences. The "recorded flips" are every flip that follows a flip that came up heads. The question to be answered is, what fraction of recorded flips are heads?

The answer the paper gives is 5/12. But this is wrong. There are eight recorded flips (the underlined ones in Table I), and four of them are heads. So the fraction is 1/2, just as we would expect.

I haven't read the rest of the paper yet, but if it uses the same type of logic, I would say its conclusion is in error.

The odds are 5/12 because the sample is 6, the two TT results are eliminated - this is the bias in sampling past streaks - the sample is the proportion of heads following an initial head in a three toss sequence

BWV · Apr 1, 2019

PeterDonis said:

It's worth noting that the paper's actual conclusion is not that "hot hands" exist, but that "cold hands" exist--i.e., that the probability of success after a streak of successes is smaller, not larger.

No, the expectation with the sampling bias adjusted is that the percentage of wins would decline, but the fact that it does not supports the existence of hot hands

PeterDonis · Apr 1, 2019

BWV said:

The odds are 5/12 because the sample is 6, the two TT results are eliminated - this is the bias in sampling past streaks - the sample is the proportion of heads following an initial head in a three toss sequence

"The sample is 6" because they are averaging differently: they are taking the average of 0, 0, 1, 0, 1/2, and 1, which is 5/12. But that is the answer to a different question from the question they pose earlier in the paper, and the question that other studies of "hot hands" are answering.

The question whose answer is 5/12 is the question: "What is the average number of heads that follow a head per three-flip sequence, weighting equally each of the 6 possible sequences that contain at least one flip that follows a head?"

But the question posed earlier in the paper is: "What is the fraction of flips following a head that are heads?" The answer to that question is 1/2, since, as I said, there are eight flips that follow a head, and four of them are heads.

In other words, the difference is whether you weight the average by three-flip sequence, or simply by recorded flip (i.e., flip that follows a heads). The paper's argument appears to be that the average should be weighted by three-flip sequence, since in any actual trial only one sequence will be observed. In other words, the paper is claiming that when we analyze real-world data, we should treat it as though we are looking at a single trial with some huge number of flips--the equivalent of looking at only one of the trials from Table I that contains at least one flip following a head, chosen at random.

However, in real-world situations--such as sporting events--the number of "flips" (say, shots taken by a basketball player) is not fixed in advance, and we are not looking at single trials of some number of flips in isolation. We are looking at a corpus of data containing many trials, each of which ended up with a number of flips that was not determined in advance. When we look at this corpus of data, I think we are doing the equivalent of looking at all of Table I and counting the flips following a head that are heads, in all the trials. And this seems to be the underlying assumption of previous studies of "hot hands".

PeterDonis · Apr 1, 2019

BWV said:

The odds are 5/12 because the sample is 6, the two TT results are eliminated

Another way of seeing why I don't think this is right is to ask, why should the two sequences with no recorded flips be eliminated? In the real world nothing prevents those two sequences from happening. So if I'm going to average the way the paper is averaging, it seems like I should be averaging 0, 0, 0, 0, 1, 0, 1/2, 1, to get 3/16.

But the paper is looking at it as the fraction of heads per recorded flip per sequence, which isn't right; I should only be averaging once, not twice, so I should be averaging the number of heads following heads per sequence, over all eight sequences. So what I should be averaging is 0, 0, 0, 0, 1, 0, 1, 2, to get 1/2.

PeroK · Apr 1, 2019

BWV said:

No, the expectation with the sampling bias adjusted is that the percentage of wins would decline, but the fact that it does not supports the existence of hot hands

Whoever wrote that paper has no grasp of basic probabilities. There are 12 heads in that experiment. 4 are followed by another head; 4 are followed by a tail; and, 4 have no subsequent toss.

The fundamental assumption governing the data is that each toss is independent. So, the idea of finding a bias in data that has been generated on the assumption of no bias is absurd.

The fundamental problem with a hot hands theory is whether it's the physical coin or the gambler that counts. What happens if you physically replace a coin that has come up heads the previous throw? Does the new coin inherit the hot streak?

StoneTemplePython · Apr 1, 2019

PeroK said:

Whoever wrote that paper has no grasp of basic probabilities.

Be careful with this... it's a very subtle. There have been (a few) nice discussions about this paper on Gelman's blog. Here's a link to one of them.

https://statmodeling.stat.columbia.edu/2015/07/09/hey-guess-what-there-really-is-a-hot-hand/

PeroK · Apr 1, 2019

StoneTemplePython said:

Be careful with this... it's a very subtle. There have been (a few) nice discussions about this paper on Gelman's blog. Here's a link to one of them.

https://statmodeling.stat.columbia.edu/2015/07/09/hey-guess-what-there-really-is-a-hot-hand/

What's a hot hand? In this case?

StoneTemplePython · Apr 1, 2019

PeroK said:

What's a hot hand? In this case?

mild streakiness in shooting baskets

PeroK · Apr 1, 2019

StoneTemplePython said:

mild streakiness in shooting baskets

There's no comparison between shooting baskets and coins.

There clearly may be a bias based on the confidence of the team or player.

BWV · Apr 1, 2019

PeroK said:

There's no comparison between shooting baskets and coins.

There clearly may be a bias based on the confidence of the team or player.

The point being made is that
A) the original hot hand studies found no evidence of streaks -'its a behavioral illusion'
B) this paper finds a sampling bias in that methodology and uses fair coin tosses to illustrate this bias
C) after correcting for this bias, the evidence for streaks appears in the original Gilovich, Vallone, and Tversky paper

PeroK · Apr 1, 2019

StoneTemplePython said:

Be careful with this... it's a very subtle. There have been (a few) nice discussions about this paper on Gelman's blog. Here's a link to one of them.

https://statmodeling.stat.columbia.edu/2015/07/09/hey-guess-what-there-really-is-a-hot-hand/

I'm struggling a little to see the subtlety here. It seems rather obvious that you must be careful with sub-samples of different sizes and different success rates. Surely that's basic?

For example, both @PeterDonis and I immediately rejected the analysis in the paper for this very reason. Okay, in the original 1985 paper the error was possibly harder to spot.

Also, in general, I would say it's not enough to analyse data unless the data itself is clearly understood. For example, I could postulate several scenarios based around possible basketball strategies that would militate toward or against perceived streakiness. For example:

Let's assume player X does genuinely get "hot hands". I take this to mean: given a typical 3-point shot, a) he scores 20% of the time (if it's the first shot of the game or he missed the previous shot); b) he scores 30% of the time (if he scored on the previous shot).

If the opposition does not react, then these figures would emerge from the game. As they would in, say, a test in training.

But, the opposition may well react. And, if he scores a 3-pointer, then they make the next shot harder - i.e. the next shot is no longer "typical". And, especially, if he scores two in a row, they make it even harder. Perhaps even to the point where he has no opportunity to shoot any more 3-pointers for a while. By which time, he has lost his "hot hands".

The question then, is, does he have "hot hands" or not? I would say yes, in the sense that he can shoot better with success. But, the stats wouldn't fully emerge to support this because the opposition would do something about it.

Ironically, of course, a sports statistician might come along and insist that the defensive team are unnecessarily commiting resources to blocking his three-pointers. That there is no statistical evidence that he scores more frequently after a successful shot.

But, if the team follows this advice, then they find out the hard way that he really does have "hot hands" and they were right to try to stop his shooting!

PeterDonis · Apr 1, 2019

PeroK said:

The question then, is, does he have "hot hands" or not? I would say yes, in the sense that he can shoot better with success. But, the stats wouldn't fully emerge to support this because the opposition would do something about it.

As far as I can tell, possible confounders like this are not being considered in the paper under discussion and the various online discussions of it. I think for this discussion here we should table such confounders until we have further hashed out the underlying issue of when the statistics show a "hot hand" in terms of the net effect of the player's shooting better after a successful shot plus whatever countermeasures the opposition takes to offset this. If we can't agree on how to extract that net signal from the data, we're not going to get very far in trying to parse out individual sub-components of the signal.

StoneTemplePython · Apr 1, 2019

PeroK said:

I'm struggling a little to see the subtlety here. It seems rather obvious that you must be careful with sub-samples of different sizes and different success rates. Surely that's basic?

For example, both @PeterDonis and I immediately rejected the analysis in the paper for this very reason. Okay, in the original 1985 paper the error was possibly harder to spot.

I read the paper when it came out close to 4 years ago -- I don't recall all the points. The BIG point is as stated in post #13, that the original paper from the '80s had a methodological bust. It may seem obvious to you but that's hindsight bias -- or equivalently the red dot (cashmere sweater) episode of Seinfeld.

The original Hot Hand Fallacy paper in the '80s has been cited repeatedly over the decades and was a standard talking point in Intro Stats Class. The point was that the lay people are fooled by chance and statisiticians point out their fallacies. The irony pointed out in the 2015 paper is that the fallacy was with the statisticians.

- - - -
Above and beyond that I think what Peter is saying is right...

There's a major degrees of freedom issue here lurking in terms of people's personal views on how the game works. There are also micro vs macroscopic issues -- what lens you're using to look at the game with and a whole bunch of other mine fields.

PeterDonis · Apr 1, 2019

PeroK said:

in the original 1985 paper the error was possibly harder to spot

What error? If the criticism I have made of the paper linked in the OP is correct, there was no error in the 1985 paper (assuming that you are referring to the original statistical study that found no "hot hand" in the data).

PeterDonis · Apr 1, 2019

Looking around online, I found the following paper:

https://www.tandfonline.com/doi/pdf/10.1080/10691898.2017.1395303
This paper has a very interesting statement in section 4.3 (there are a couple of other places where similar statements are made): it says that the conditional probability of heads on a coin flip, given that the previous flip was heads, is 0.5!

It seems to me that this is basically conceding the point I made in post #6. Or, to put it another way, if I want to know if a particular coin flipper has a "hot hand", I will collect data on a large number of flips he makes and see if the conditional probability of heads given that the previous flip was heads is greater than 0.5 to a statistically significant degree. (Similar remarks apply if I want to assess whether there is a "hot hand" after "streaks" of multiple heads.)

To put it yet another way: the underlying hypothesis of the "hot hand" is basically that "success causes success", i.e., that, for example, a player having made the last shot (or last 2, 3, ... shots) is a significant causal factor in the player's current shot. So what we have here is a hypothesized causal factor, and we want to see if it is actually present. The standard way of doing that with a large corpus of data is to see whether the conditional probability of the effect (success--making the shot) when the cause (making the previous shot, or previous 2, 3, ... shots) is present is greater (to a statistically significant degree) than the overall probability of the effect for the entire corpus of data.

As far as I can tell, this is what the original paper that found no evidence of a "hot hand" did. And it seems to me like the right thing to do. So I don't see any justification for the paper in the OP claiming that it is in fact the wrong thing to do.

PeterDonis · Apr 1, 2019

The original 1985 paper by Gilovich, Vallone, and Tversky is here:

http://www.cs.colorado.edu/~mozer/Teaching/syllabi/7782/readings/gilovich vallone tversky.pdf

BWV · Apr 1, 2019

PeterDonis said:

Looking around online, I found the following paper:

https://www.tandfonline.com/doi/pdf/10.1080/10691898.2017.1395303
This paper has a very interesting statement in section 4.3 (there are a couple of other places where similar statements are made): it says that the conditional probability of heads on a coin flip, given that the previous flip was heads, is 0.5!

It seems to me that this is basically conceding the point I made in post #6. Or, to put it another way, if I want to know if a particular coin flipper has a "hot hand", I will collect data on a large number of flips he makes and see if the conditional probability of heads given that the previous flip was heads is greater than 0.5 to a statistically significant degree. (Similar remarks apply if I want to assess whether there is a "hot hand" after "streaks" of multiple heads.)

To put it yet another way: the underlying hypothesis of the "hot hand" is basically that "success causes success", i.e., that, for example, a player having made the last shot (or last 2, 3, ... shots) is a significant causal factor in the player's current shot. So what we have here is a hypothesized causal factor, and we want to see if it is actually present. The standard way of doing that with a large corpus of data is to see whether the conditional probability of the effect (success--making the shot) when the cause (making the previous shot, or previous 2, 3, ... shots) is present is greater (to a statistically significant degree) than the overall probability of the effect for the entire corpus of data.

As far as I can tell, this is what the original paper that found no evidence of a "hot hand" did. And it seems to me like the right thing to do. So I don't see any justification for the paper in the OP claiming that it is in fact the wrong thing to do.

Your link references, explains and supports the conclusions of the Miller paper in the OP

PeterDonis · Apr 1, 2019

BWV said:

Your link references, explains and supports the conclusions of the Miller paper in the OP

I know it claims to, yes. But as I said, it also says that the conditional probability of heads given that the previous coin flip was heads is 0.5, in the simple coin flip example that is suppposed to illustrate the issue that the Miller paper raises. So basically this paper and the Miller paper are claiming that the conditional probability of heads given that the previous flip was heads--and by extension the conditional probability of, say, a player making a shot at basketball, given that he made his last shot (or his last n shots if we are looking at "streaks")--is not the right thing to look at when assessing whether there is a "hot hand". And I don't see why that's true. I understand the different statistical things these papers are calculating; I just don't see why those things, rather than the conditional probability, are the right things to look at for assessing whether a "hot hand" exists.

BWV · Apr 1, 2019

There is an important distinction between the conditional probability of heads in a live test of coin flipping and the proportion of heads in a finite sample of past coin flips. Sectin 4.3 states that a live conditional probability of 0.5 will translate to a .405 in a sample, per Miller’s paper

PeterDonis · Apr 1, 2019

BWV said:

There is an important distinction between the conditional probability of heads in a live test of coin flipping and the proportion of heads in a finite sample of past coin flips.

You don't have to have a "live test" to compute a conditional probability. You can do it on a finite sample of past coin flips just fine.

BWV said:

Sectin 4.3 states that a live conditional probability of 0.5 will translate to a .405 in a sample, per Miller’s paper

Section 4.3 says that if you generate a "sample" in a very particular way (Problem 2 in Table 6), the expected fraction of heads is .405 (17 / 42) instead of 0.5. What they don't justify, in my view, is why I should care about this rigmarole for generating a sample. Basically their claim amounts to the claim that somehow the data we have on shots by basketball players, for example, was generated by a process that is more like Problem 2 in Table 6 than like Problem 3 in Table 6. I see no reason for accepting that claim.

PeroK · Apr 2, 2019

StoneTemplePython said:

I read the paper when it came out close to 4 years ago -- I don't recall all the points. The BIG point is as stated in post #13, that the original paper from the '80s had a methodological bust. It may seem obvious to you but that's hindsight bias -- or equivalently the red dot (cashmere sweater) episode of Seinfeld.

The original Hot Hand Fallacy paper in the '80s has been cited repeatedly over the decades and was a standard talking point in Intro Stats Class. The point was that the lay people are fooled by chance and statisiticians point out their fallacies. The irony pointed out in the 2015 paper is that the fallacy was with the statisticians.

- - - -
Above and beyond that I think what Peter is saying is right...

There's a major degrees of freedom issue here lurking in terms of people's personal views on how the game works. There are also micro vs macroscopic issues -- what lens you're using to look at the game with and a whole bunch of other mine fields.

I probably don't know enough about basketball and I don't know about Seinfeld either. But, I'm always sceptical about statistical analysis of sports data because of your second point about degrees of freedom etc.

Also, I don't understand why the 1985 paper didn't test the model with biased and unbiased test data. If you can define what "hot hands" actually means, then you should be able to construct a test data set that exhibits the characteristic. The model should report the same bias that you've built into your data.

PeroK · Apr 2, 2019

BWV said:

There is an important distinction between the conditional probability of heads in a live test of coin flipping and the proportion of heads in a finite sample of past coin flips. Sectin 4.3 states that a live conditional probability of 0.5 will translate to a .405 in a sample, per Miller’s paper

It's the conditional expected proportion that is 0.405. The conditional proportion of heads after a head is 0.5, as it must be, as that is the assumption on which the data is built.

And the problem is that the expected value is not based on the same number of heads in each case.

Are Sampling Issues the Key to Proving the Existence of Hot Hands?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Undergrad Understanding permutations and combinations in a coin toss experiment

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect