I Polling Margin of Error

FactChecker · Sep 26, 2024

There is another theoretical aspect of polling: How is the usual variance equation influenced by the constraint that the total of the percentages must add up to 100%?
I have no experience with this.

Vanadium 50 · Sep 26, 2024

If you have a sample of N, and the fraction voting for Jones is f, the uncertainty on that number is ##\sqrt{Nf (1-f)}##.

Dale · Sep 26, 2024

Vanadium 50 said:

If you have a sample of N, and the fraction voting for Jones is f, the uncertainty on that number is ##\sqrt{Nf (1-f)}##.

Yes. But what we are talking about is the uncertainty on that number divided by ##N## (times 100 %). For the CNN poll that works out to 1.1 %, which is right in line with my Monte Carlo simulation and the Bayesian posterior.

Vanadium 50 · Sep 26, 2024

I trust @FactChecker 's ability to do algebra to convert that formula to whatever one he is most interested in. (You probably want to vide by Nf and not N in most cases)

Dale · Sep 26, 2024

Vanadium 50 said:

I trust @FactChecker 's ability to do algebra to convert that formula to whatever one he is most interested in.

OK, but you are claiming that the CNN poll "beat the ##\sqrt{N}## uncertainty", which it didn't.

Vanadium 50 · Sep 27, 2024

So, I repeated Dale's Monte Carlo, and got an 1σ variation of 1.2%. I did some things slightly differently (e,h, a 48-47-5 true distribution), but would say we agree. There are also a couple of things I did that I didn't like for expedience sake. Did you know Excel doesn't have a POIISSON.INV function?

So I am convinced.

Even so, I think this number is, if not questionable, at least discussable. It implies that the 1σ uncertainty on the sample correction is 0.9%, which is nine respondents in each column. I'll leave it to people to decide for themselves if they believe that a gigantic poll reqyurubf would get the correct answer to better than 1%.

Insofar as the betting odds people are rational actors, they believe they poll errors are uncerestimated, or equivalently this race is even close than the polls suggest. I'm not saying they are right and I am not saying they are wrong - just that that is what they are betting their own money on.

Dale · Sep 27, 2024

Vanadium 50 said:

Insofar as the betting odds people are rational actors, they believe they poll errors are uncerestimated

I think that is accurate. The betting odds people are making a prediction on behavior, while the pollsters are (in the best case) making a measurement of opinion. So the uncertainty in the behavior prediction is much greater than the uncertainty in the measurement of opinion. And the uncertainty in the measurement of opinion is also greater than just the margin of error.

Vanadium 50 · Sep 27, 2024

Dale said:

making a prediction on behavior, while the pollsters are (in the best case) making a measurement of opinion

That would have sent one of my social sciences professors into a tizzy. He always argued that polls measure behavior - they measure what people say they think, not what they actually think.

He was also rumored to make his own moonshine. FWIW.

However, I think you're still dealing with a difference in behavior - what people will say and how people will vote.

FactChecker · Sep 27, 2024

The actual results depend on the weather, job demands, attitude regarding whether their vote matters, etc.
Those are sources of variability that are hard to factor in and I am not sure we would want them to try.

Vanadium 50 · Sep 27, 2024

While people did argue "no, this is just opinion", it's pretty clearly really an attempt to prognosticate. Otherwise, why use likely voters? Why not include everyone - resident aliens, illegal aliens, those under 18, and so on. They have opinions as well.

The bigger issue is, of course, that US presidents are elected by the states, not the populace at large. Changing opinions in California or Wyoming makes no difference. So what is being measured is correlated with electoral outcome but not the same.

A must-win district for the Democrats is NE-2, Omaha. This is what got me thinking about thus. Harris is polling 11 points ahead of Biden in the latest poll. That's well above the margin of error, and well above the national shift. Maybe they just really dig her in Omaha. But in an election where both candidates have high floors and low ceilings, an 11 point swing cries out for explanation.

BTW, this is also a CNN poll, contemporaneous with the 2074 subhect national poll. I am hoping these are two completely eparate polls and not that a third of the people surveyed in the national poll are from Omaha.

Dale · Sep 27, 2024

Vanadium 50 said:

While people did argue "no, this is just opinion", it's pretty clearly really an attempt to prognosticate. Otherwise, why use likely voters? Why not include everyone - resident aliens, illegal aliens, those under 18, and so on. They have opinions as well.

That is one problem with statistics in general. People like to misuse them. This is similar to how a small p value is often taken to indicate a large or important effect. It is much easier to just get excited over a number than it is to understand what that number actually means.

Klystron · Sep 27, 2024

Living and registered to vote in a "swing state", I receive a few 2024 election poll requests per business day including CNN. Many of the pollsters ask several demographic questions before the actual poll. Counting or ignoring your (election) response depends on your demographic answers.

My daughter who lives nearby responded to several identical polls. Our demographics coincide except for age, gender and (wait for it) race. She identifies as Asian American while I mark White as there is no Ashkenazi category or Decline to State. I was dropped from all the polls while my daughter was deluged with followup questions and advanced polls.

Published poll results (NYT, WaPo) did not mention these participation filters. Selecting poll participants makes sense depending on criteria but seems fraught with preconceptions and possible malfeasance.

FactChecker · Sep 27, 2024

Vanadium 50 said:

While people did argue "no, this is just opinion", it's pretty clearly really an attempt to prognosticate.

Yes. There is no point otherwise.

Vanadium 50 said:

The bigger issue is, of course, that US presidents are elected by the states, not the populace at large. Changing opinions in California or Wyoming makes no difference.

All states except Nebraska and Maine are "winner-take-all" for their Electoral College votes. California and Wyoming are probably safe for one side or the other and their individual votes are never split.
An interesting agreement: A lot of states have agreed that they will give all their Electoral College votes to the winner of the national popular vote as long as enough states in the agreement can dominate the Electoral College. That would bring the Presidential election back to being decided by the popular vote using a "Rube Goldberg" mechanism.

Vanadium 50 said:

So what is being measured is correlated with electoral outcome but not the same.

Right. But it is the votes of the "swing States" that really matter. So it is the poll in individual states that are important.

Vanadium 50 said:

A must-win district for the Democrats is NE-2, Omaha.

Nebraska is probably a safe state for Trump. It would require a enormously lopsided vote in Omaha to change that.

Vanadium 50 said:

I am hoping these are two completely eparate polls and not that a third of the people surveyed in the national poll are from Omaha.

That is a safe bet.

Vanadium 50 · Sep 27, 2024

Klystron said:

Living and registered to vote in a "swing state"

I'm sorry.

Klystron said:

as there is no Ashkenazi category

I met a professor in some liberal arts field or other who went on a tear about how the Ashkenazim are really Polish, there are no Mizrahi, and the Levant was originally settled by Muslims 2000 years before Mohammed. Maybe she wrote the poll.

I don't see any skulduggery in you not being called any more (and might consider it a plus). If they are already overampling in your demographic and undersampling in your daughter's, isn't this what you expect? I also don't expect that they would drop a completed poll - just de-weight it.

FactChecker · Sep 27, 2024

Rather than guessing, it would be interesting to see what methods the pollsters actually use. A lot will be proprietary. I have seen some detailed descriptions, but I can not find them with a casual search. Here are a couple of general descriptions of some issues by CNN and the polling company, SSRS, that they are using now.
If anyone can find more detailed descriptions by respected polling companies, I would be interested in seeing them.
CNN: https://www.cnn.com/2021/09/10/politics/cnn-polling-new-methodology/index.html
SSRS: https://ssrs.com/research-areas/political-election-polling/

Hornbein · Sep 27, 2024

Klystron said:

Selecting poll participants makes sense depending on criteria but seems fraught with preconceptions and possible malfeasance.

Truly random sampling of voters is highly impractical. I don't have any better ideas than stratified sampling.

FactChecker: A lot of states have agreed that they will give all their Electoral College votes to the winner of the national popular vote as long as enough states in the agreement can dominate the Electoral College. That would bring the Presidential election back to being decided by the popular vote using a "Rube Goldberg" mechanism.

If such a system ever made a difference the voters in the states whose votes were switched would be justifiably outraged. Such laws would be immediately rescinded. Indeed if there were ever a chance that such a law would cause Hated Enemy to win against the will of the state's voters then I believe it would be rescinded before the election. In short, the whole thing is yet another symbolic gesture.

Vanadium 50 · Sep 28, 2024

I was afraid to bring that up, lest this devolve into a discussion on the Electoral College. The important point is that it is what the election rules are, not what they might be. It is pretty clear it is doing what it is designed to do, which may or may not be what any given citizen wants.

I agree with @Hornbein that holding the election and then changing the rules to change the outcome would provoke outrage, and will never happen.

Givem the present estimated distribution of polls and likely state outcomes, we are talking about around a ½% edge for one candidate. Not insignificant, but also not huge - it has been higher in the past.

Vanadium 50 · Sep 28, 2024

FactChecker said:

A lot will be proprietary. I

While I'd like to see this too, this is why I think we never will. They are selling their 'secret sauce". I'd be delighted if they did a data plus analysis dump of past polls: let's have a good look at 2016.

The problem with correcting for sampling bias comes about because of correlations. I can easily re-weight the variables so that all the 1-d distributions match expectations, but what about 2-way? That is, my sample might match the parent distribution in male vs. female and young vs. old. but not in young men vs. older women. If you have 10 yes/no variables and a 1000 subjects, you have one per bucket. You'll never get that right (which is why they do something else)

FactChecker · Sep 28, 2024

Vanadium 50 said:

While I'd like to see this too, this is why I think we never will. They are selling their 'secret sauce". I'd be delighted if they did a data plus analysis dump of past polls: let's have a good look at 2016.

I have seen descriptions that are detailed enough for my satisfaction. I don't remember what polling organization did that.

Vanadium 50 said:

The problem with correcting for sampling bias comes about because of correlations. I can easily re-weight the variables so that all the 1-d distributions match expectations, but what about 2-way? That is, my sample might match the parent distribution in male vs. female and young vs. old. but not in young men vs. older women. If you have 10 yes/no variables and a 1000 subjects, you have one per bucket. You'll never get that right (which is why they do something else)

It seems like correlations of the general population like sex versus age distribution, which can be well established independently of the poll, would be possible to adjust for.

Vanadium 50 · Sep 28, 2024

FactChecker said:

It seems like correlations of the general population like sex versus age distribution, which can be well established independently of the poll, would be possible to adjust for.

Sure. Bur when I am looking at Pacific Islander females with some college but not degree, in a particular age and income band, I have chopped the data up so finely I may not be able to correct: if I expect 0.5 in my sample, what do I do if I have two? If I have zero?

A sample of 2000 and 11 yes/no questions puts on average one entry per bin. Sometimes you'll get 1, sometimes a few, and sometimes none.

This is why the pollsters don't do this.

FactChecker · Sep 28, 2024

Vanadium 50 said:

Sure. Bur when I am looking at Pacific Islander females with some college but not degree, in a particular age and income band, I have chopped the data up so finely I may not be able to correct: if I expect 0.5 in my sample, what do I do if I have two? If I have zero?

Right. So nobody should take it to that extreme.

Vanadium 50 said:

A sample of 2000 and 11 yes/no questions puts on average one entry per bin. Sometimes you'll get 1, sometimes a few, and sometimes none.

This is why the pollsters don't do this.

I think you mean that they don't use stratified sampling to a ridiculous extreme. They certainly do use stratified sampling.

Vanadium 50 · Sep 28, 2024

Their sampling, AFAIK involves clustering. If you remember the term "soccer moms", that is a cluster. If a respondant ends up in that category, by virtue of some of her answers, they integrate over the others. In a sample of 2000. they probably want 20-40 clusters.

If someone is a public sector worker other than a first respnder or member of the military, they have certain voting preferences, and they are more similar to each other than across groups. So you can integrate over age. If someone is unemployers, their voting preferences are very different if they are 25 or 75, so you can't.

FactChecker · Oct 22, 2024

Vanadium 50 said:

Fine. Let me then ask yet again, how do you beat the √N uncertainty, as hee CNN poll I mentioned claims to.

This discussion hasn't gotten very far. I suggest reading the literature on stratified sampling, where this is discussed in detail with the proper rigor. A standard classic text is Sampling Techniques by William Cochran, although there are many other references.
If the subject is polling to predict vote results, there is one thing to remember: the final vote depends on many of the most recent events like the weather, major world events, the recent optimistic or pessimistic mood of particular voting groups, etc.
No statistical method can accurately predict those effects and it should not try. It should only try to estimate the current state when the poll is taken.

I Polling Margin of Error

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

I Stochastic calculus: Ito's lemma and differentials

I Help me understand skewness in QQ-plots please

I Intransitive implication

Recent Insights

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem