I What is the Margin of Error in Polls and How is it Calculated?

Vanadium 50 · Sep 23, 2024

What exactly is a "margin of error" intended to be for a poll?

Is it a one sigma number? A 90 or 95% CL? An educated guess?

As I understand it, this number is reported on each result - i.e. if the poll says Smith and Jones each have 50% support with a 5% MOE, the "true" result can be anywhere between 45-55 and 55-45. So when the pundits say "the difference is less than the MOE" they really mean "less than twice the MOE."

Also as I understand it, polls need to be corrected for over and undersampling various subpopulations. (e.g. people with cell phones and no landline tend to be undersampled) This correction should form part of the MOE. But it surely is not distributed as a Gaussian. We could even argue about whether it is distributed at all!

When aggregators combine polls, they surely look at (and hopefully weight appropriately) the MOE. Do they also consider how accurate the MOE has been in the past? If a pollster systematically underestimates their MOE, that does not make it a better poll for sure. And vice versa.

Dale · Sep 23, 2024

My understanding is that it is supposed to be a 95% confidence interval. Frequentist analyses would put it at about twice the standard error of the mean.

I think that Bayesian analyses are becoming more common. So I don’t know exactly what Bayesian quantity is reported as the margin of error.

FactChecker · Sep 23, 2024

The 95% confidence is customary. Here is a good discussion.

Vanadium 50 · Sep 23, 2024

FactChecker said:

The 95% confidence is customary

Thanks.

Then what am I to make of two polls that differ by 2x or 3x the margin of error?

Vanadium 50 · Sep 23, 2024

Dale said:

Bayesian analyses

This would be among the last analyses I would treat this way. You are maximally sensitive to your prior. "Nobody I know could possibly vote for Smith!" And as you know, the "flat prior" is a myth.

Dale · Sep 23, 2024

At 95% confidence you expect 1 in 20 to differ that much even if everything is done correctly

Vanadium 50 said:

This would be among the last analyses I would treat this way. You are maximally sensitive to your prior. "Nobody I know could possibly vote for Smith!" And as you know, the "flat prior" is a myth.

Yes, priors are a part of Bayesian statistics. But I doubt polling is particularly more sensitive to the priors than many other applications.

Vanadium 50 · Sep 23, 2024

Dale said:

At 95% confidence you expect 1 in 20 to differ that much even if everything is done correctly

Sure, but if this is a Gaussian, if you have a probability of 5% that you lie anywhere outside the interval (and I realize this is imprecise language), the probability of it lying 2x out is less than 1/10000 and 3x out less than one in a million.

Dale said:

But I doubt polling is particularly more sensitive to the priors than many other applications.

Election of 2016 anyone?

But my concern is more fundamental. If I measure a physical quantity N different ways, I am happy to use previous measurements as a prior. If one technique has a systematic shift, it will be automatically deweighted by the priors. If every measurement uses the same technique - i.e. robocalls to land lines - they will all have the same systematic shift, and there's no way to correct for this.

Agent Smith · Sep 23, 2024

I think a MoE is supposed to be ##2 \times \text{Standard Error}##, where standard of error in this case would be ##\frac{\sigma_{\hat p}}{\sqrt n}##. I believe the best-case scenario is when we have ##\sigma## (the population standard deviation)

Dale · Sep 23, 2024

Vanadium 50 said:

Election of 2016 anyone?

The most accurate poll aggregator I know of, Nate Silver, used Bayesian analysis. So, I don't think that actually supports an anti-Bayes stance. He gave a 1 in 3 probability of a Trump victory, based on a full-fledged Bayesian analysis with priors and all of the usual Bayesian machinery. The occurrence of a 1 in 3 event is not evidence of a model failure. In a well calibrated presidential election model it should happen pretty often, once every decade or so.

Apart from the accurate Bayesian poll aggregation, do you have any hint that Bayesian analyzed polls themselves were off by more than non-Bayesian polls?

Vanadium 50 · Sep 23, 2024

Dale, I know you live Baysean analyses. I do not consider it a "one size fits all" tool as some do. I gave my reasons for not considering it superior and don't think writing then again who wasn't convinced the first time.

I will say that Bayes lived in the 18th century, and there is still discussion of pros and cons.

Was 2016 an outlier? Sure. Was it a statistical fluctuation that hit poll after poll? We'll never know, but it sure looks more systematic. The fact that pollsters and aggregators have tweaked their methodology in response suggests they think so too.

Dale said:

do you have any hint that Bayesian analyzed polls themselves were off by more than non-Bayesian polls?

Of course not, because "off" means two different things - credibility level vs. confidence level. Is purple louder than sour?

But I don't want to get into a fight on the pros and cons of Baysean statistics. I am trying to better understand what is intended by "margin of error" and not how to improve polling and aggregation.

Vanadium 50 · Sep 23, 2024

As an aside, we look at how well our own measurements are when better ones and better averages come out. Our one sigma band tend to be too wide (i.e. we overestimate the error) but our two sigma bands are too narrow.

Dale · Sep 23, 2024

Vanadium 50 said:

Was 2016 an outlier? Sure. Was it a statistical fluctuation that hit poll after poll? We'll never know, but it sure looks more systematic.

Sure, but were Bayesian polls or Bayesian aggregators worse or better? The premier Bayesian aggregator was the best that year.

Vanadium 50 said:

I gave my reasons for not considering it superior and don't think writing then again who wasn't convinced the first time

Yes. No need to repeat the fact that you don’t like priors. It is a pretty unconvincing argument against Bayesian statistics, and you are right that it won’t become more convincing a second time.

Vanadium 50 said:

But I don't want to get into a fight on the pros and cons of Baysean statistics. I am trying to better understand what is intended by "margin of error" and not how to improve polling and aggregation

Fair enough. I think that your immediate dismissal of Bayesian analysis in this context is unfounded. But I agree that it is not particularly germane to the question of a margin of error.

I have seen detailed statements about a poll before from the publisher of the poll. I will see if I can pull up an example. These are the more scientific descriptions of the methodology compared to what gets reported. I will see if I can find one of those. Maybe it is clear about the meaning of the margin of error

Hornbein · Sep 23, 2024

Public opinion polls are not based on random samples, which is impractical in this case. They use stratified sampling, where pollees are selected to attempt to model those people who will actually vote. I suspect the poor 2016 predictions were due to unanticipated greater than usual participation by certain groups.

Other factors include alteration of election rules to boost participation by favored groups. Voter turnout in 2020 increased by a phenomenal 22 million voters.

Hornbein · Sep 23, 2024

Vanadium 50 said:

Thanks.

Then what am I to make of two polls that differ by 2x or 3x the margin of error?

That would happen if the sample size differs. Bigger samples cost more.

Bigtime candidates don't trust public polls. They pay for their own.

Vanadium 50 · Sep 23, 2024

It's not that I don't like priors. It's that I don't like sensitivity to priors. If you try multiple priors and your result barely moves, I am much happier that if small changes in the prior makes a large change in the outcome. I've seen both.

Onto the topic at hand, the thing I am really wrestling with is using these values in computations. (Which I suppose could include calculating priors for subsequent Baysean analyses). That requires more understanding than "Maybe 50% is really 45%". Correlations and corrections make a big difference here: if the correction is much larger than the margin of error, how much is this a measurement and how much an estimate (albeit by professionals)? If the polls move, is the opinion changing or is the sample changing? Are outlier such because of the data, or because of their corrections? And so on.

FactChecker · Sep 24, 2024

Vanadium 50 said:

Thanks.

Then what am I to make of two polls that differ by 2x or 3x the margin of error?

That is a good question. It's a complicated situation. IMO, it is dangerous to compare two polls that probably use different methods. Any one poll should carefully treat all the alternatives to give a valid comparison of the alternatives. I would be less confident that two different polls can be compared. For instance, suppose one poll includes an "Undecided" category and the other does not. In political polls of likely voters, one poll might be more likely to classify certain people as unlikely to vote. I think trying to compare two polls opens a can of worms.

Dale · Sep 24, 2024

Vanadium 50 said:

It's not that I don't like priors. It's that I don't like sensitivity to priors. If you try multiple priors and your result barely moves, I am much happier that if small changes in the prior makes a large change in the outcome. I've seen both

Agreed. I simply haven’t seen any indication that polling is an application that is unusually sensitive to the priors.

Vanadium 50 said:

Correlations and corrections make a big difference here

The correlations especially. A typical assumption is that the responses are independent and identically distributed (or rather that the residuals are). That assumption is demonstrably false, and accounting for it is really challenging. Both Bayesian and frequentist methods are affected by this.

Vanadium 50 · Sep 24, 2024

Hornbein said:

Public opinion polls are not based on random samples, which is impractical in this case. They use stratified sampling, where pollees are selected to attempt to model those people who will actually vote.

Correct, and part of this question is to better understand how this is incorporated into the margin of error.

Hornbein said:

suspect the poor 2016 predictions were due to unanticipated greater than usual participation by certain groups.

Not everyone agrees with that.

Hornbein said:

Bigtime candidates don't trust public polls. They pay for their own.

True, but a) I don't care what a poll I never see says, and b) the same understanding of what the numbers mean should apply whether I see them or not.

Vanadium 50 · Sep 24, 2024

Maybe being less abstract will help. There are at least three different components to the polling uncertainty. There is a pure statistical uncertainty which I will call x. There is a common systematic uncertainty from common corrections due to samping (roboalls) which I will call z, and finally an uncertainty on z called y from deviations between sample corections: r.h. one poll calls people in the morning and one at night.

You would like x to dominate, because then your margin of error is simple to calculate, and more importantly, statistics tells us hoe this variable behaves in combination and calculation.

If you sample 1000 people in a close race, the 2σ margin of error is 3.2%. That's not much smaller than the polls margin of error, so they are implicitly telling us x is large compared to y + z.

z is likely a large correction, but what matters is not the size of the shft, but rather its uncertainty. And since every pollster does pretty much the same thing, this is surely well understood. So I am prepared to believe it is small. I am prepared to believe that, subject to the proviso that if the pollsters get this wrong, they get it wrong for all the polls together.

Fixing this is not easy. If instead of robocalls, we put surveys in packages of baby food, polling would be slower, more expensive and not unbiased - only differently biased,

That leaves y, which is tough, in part because it depends on factors you can't or didn't control. One hopes it is small. (There is a famous pediatrics result that was just refuted because they missed a correlation between experienced doctors and sicker patients)

Now, what facts do we have to regute the idea that x dominates?

(1) We have polls that are outliers, at p-values we should never see.
(2) Sometimes changes in the race impact different polls substantially differently: if Smith promises free ice cream and one poll has her up 1x the margin of error (2σ) and another similar poll 4x (8σ), we can all agree that this is popular, but maybe we're not so sure about how many people like Smith. I think it would lend confidence (or credibilty...that was a joke, Dale!) if the polls moved in lockstep as the race changed.

I am not saying "all polls are bunk", as some do. But I am saying that it is quite difficult to assess how seriously to take trem, especially with outliers, and I am hoping to learn to do this better.

FactChecker · Sep 24, 2024

Vanadium 50 said:

Correct, and part of this question is to better understand how [stratified sampling] is incorporated into the margin of error.

Stratified sampling is such a large subject. The classic text is "Sampling Techniques" by Cochran.
This link is to a short excerpt pdf that discusses the estimated variance and confidence limits.

PS. If I knew how valuable that book would become, I would not have given it away when I retired. ;-)

Vanadium 50 · Sep 24, 2024

You can see from that text you will have problems when you chop the data up too finely, even before considering biases. You end up with 1/N(sample size) in places where you had 1/N(total).

Considering subsample biases will make the variance go up, and not down.

The latest CNN poll has N=2074 and a stated margin of error of 3.0%. It's already hard to reconcile those two numbers, especially at 2σ. It's certainly not the binomial error. But even assuming this is 1σ and the error I call x is ##\sqrt{Np}## this says that the systematic terms have negligible (and indeed, slightly negative!) impact on the total uncertainty. This sounds implausible.

Dale · Sep 24, 2024

Vanadium 50 said:

Maybe being less abstract will help. There are at least three different components to the polling uncertainty. There is a pure statistical uncertainty which I will call x. There is a common systematic uncertainty from common corrections due to samping (roboalls) which I will call z, and finally an uncertainty on z called y from deviations between sample corections: r.h. one poll calls people in the morning and one at night.

If I understand correctly, z and y are non-random biases due to methodological features of the poll, where z is one that is common to many or most polls (or pollsters) and y is one that is specific to a given poll (or pollster).

Vanadium 50 said:

z is likely a large correction, but what matters is not the size of the shft, but rather its uncertainty. And since every pollster does pretty much the same thing, this is surely well understood. So I am prepared to believe it is small. I am prepared to believe that, subject to the proviso that if the pollsters get this wrong, they get it wrong for all the polls together.

Fixing this is not easy. If instead of robocalls, we put surveys in packages of baby food, polling would be slower, more expensive and not unbiased - only differently biased,

This is a big issue. One thing that can be done is to focus on the changes in polling results. Even if there is some bias, as long as that bias is consistent from yesterday to today, changes can be meaningful.

There is one other possibility, but it is problematic. Polls are an attempt to measure opinions. They are not an attempt to predict behavior. But in election years that is what they are (mis) used for. However, insofar as you are willing to (mis) use polls as predictions of behavior, you can get a bit of feedback on the magnitude of the bias. It is a small amount of feedback

Vanadium 50 said:

Now, what facts do we have to regute the idea that x dominates?

We should be careful. That idea is not an idea claimed by the pollsters themselves as far as I can tell. I think that this idea is more of a vague impression by the public. It does need to be refuted, but in the sense that the GR bowling ball on a rubber sheet needs to be refuted.

One well respected and prolific pollster is SurveyUSA. Their methodology is described here:

https://www.surveyusa.net/methodology/

They say:

Though commonly cited in the presentation of research results, “sampling error” is only one of many types of error that may influence the outcome of an opinion research study. More practical concerns include the way in which questions are worded and ordered, the inability to contact some, the refusal of others to be interviewed, and the difficulty of translating each questionnaire into all possible languages and dialects. Non-sampling errors cannot be quantified

So the pollsters themselves (at least the high quality ones) recognize that there are other sources of error besides your x.

Vanadium 50 said:

I think it would lend confidence (or credibilty...that was a joke, Dale!)

Excellent! I am Dale and I approve this joke.

FactChecker · Sep 24, 2024

Vanadium 50 said:

You can see from that text you will have problems when you chop the data up too finely, even before considering biases.

If you are not grouping the subsample categories wisely, there is no reason to use stratified sampling.

Vanadium 50 said:

You end up with 1/N(sample size) in places where you had 1/N(total).

That is true. The subsamples need to be clustered around the subsample mean to reduce the subsample variance, even with the smaller subsample size.

Vanadium 50 said:

Considering subsample biases will make the variance go up, and not down.

Where stratified sampling is used wisely, that is not the case.

FactChecker · Sep 24, 2024

Suppose you have a sample from two groups of equal sizes, one clustered closely around 100 and the other clustered closely around -100. By grouping the subsamples, you have two small subsample variances. The end result will be smaller than if you ignored the groups and had a lot of large ##(x_i-0)^2 \approx 100^2## terms to sum.

Vanadium 50 · Sep 24, 2024

After some thinking, I concluded that a poll can beat √N. Sort of.

Suppose East Springfield is known to vote 100% for Smith. Now you don't have to poll them - you know the answer. The margin of error is driven not by the total, but by the same from West Springfield.

The problem is that this is only as good as the assumptions, and if you put enough in, it becomes more "poll-influenced modeling" than polling. That may be a good thing, but it is not the same good thing as "polling".

Dale said:

They are not an attempt to predict behavior. But in election years that is what they are (mis) used for.

Fundamentally, they are all built on a lie. "If the election were held today". But yes, they are used as predictors. Despite some spectacular failures like "Dewey Defeats Truman". They are the worst tools except for all the others.

I find betting odds to be interesting. They involve real money, so the incentives are different. The ask the question "what do you think will happen" which is a different question than "what do you want to happen". They respond to events much faster than polls. Finally, they are illegal in the US for US elections, so you are getting an interestingly selected sample. I would not say these are more useful than polls, but they do provide different information.

More interestingly, they don't always agree with each other. This violates the Law of One Price, which opens up the possibility of arbitrage.

Dale · Sep 24, 2024

Vanadium 50 said:

Fundamentally, they are all built on a lie. "If the election were held today". But yes, they are used as predictors.

It isn’t a lie. It is a counterfactual. Unlike electrons, humans can have definite opinions on counterfactuals.

Vanadium 50 · Sep 24, 2024

Sure...but the answer you get from that question when you are done processing is "If the election were held last week...."

Vanadium 50 · Sep 24, 2024

FactChecker said:

Suppose you have a sample from two groups of equal sizes,

But the variance of the total sample does not change by dividing it into subsamples, and so the uncertainty on the mean does not go down by dividing it. The exception is when you know a priori that the samples are exactly equal size (actually, you only need to know the relative mix better than you can count it, but I am sure you understand what is meant)

I don't think anyone disagrees with the idea that you can reduce the uncertainty by incorporating information apart from the poll itself. I think the question is ay one point are you no longer doing polling? There's a famous (mis) quoye from the Election og 1972 "Nobody I know voted fro Nixon". And that's true from Manhattan you had to go quote a way to find somewhere Nixon won. All the way to Queens or Staten Island.

Factoring in "what everybody knows" is a two-edged sword. Maybe three.

Dale · Sep 24, 2024

Vanadium 50 said:

Factoring in "what everybody knows" is a two-edged sword. Maybe three.

Ignoring what is known is also problematic.

The problem you are getting at is distinguishing between what is known and what is erroneously believed to be known.

FactChecker · Sep 24, 2024

Vanadium 50 said:

But the variance of the total sample does not change by dividing it into subsamples, and so the uncertainty on the mean does not go down by dividing it.

It does reduce the uncertainty of the mean if you know what proportions of the distribution are in each strata.

Vanadium 50 said:

The exception is when you know a priori that the samples are exactly equal size (actually, you only need to know the relative mix better than you can count it, but I am sure you understand what is meant)

Equal size is not point. You would prefer a sample that most closely reflects the population proportions. You can make appropriate adjustments if the proportions in the sample are different from the population proportions. That is beneficial if one category is more difficult or expensive to get a sample from. That is a very common problem.

Stratified sampling is a well-established statistical approach. Any reputable polling organization uses it extensively.

I What is the Margin of Error in Polls and How is it Calculated?

Similar threads

B A Little Probability Puzzle

I A variant of the Monty Hall problem

I What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

I Please Explain (actually explain) The Monty Hall Problem

B How Rare Is Low Smartphone Usage Among Metro Travelers in Japan?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers