In comments on DaveG’s analysis of McCain leading Obama by a point in the Franklin &; Marshall Poll, long-time favorite commenter of mine Caroline writes:
With a MoE of +/-3.9% McCain does not “lead”.
This is pretty much true. But this raises an important post for those of you who come here for the horse race analysis. If you *really* want to be technical, with a sample size of 640, we *can* be 20% certain that McCain is leading based on the F&M poll. Now that is to say, we’d be much better off relying on a coin flip rather than this poll in choosing the correct leader, but nonetheless, there are conclusions we can draw about who is leading based on the F&M poll. Just not very useful conclusions.
And this is just a point that I want to make going forward, and it is a very important one to remember with error margins. Pointing to error margins is one of those things in the blogosphere that is often used to show up people who have no idea what they are talking about. Most people who always say “the polls were wrong” or “the polls are all over the place” or “YAY CANDIDATE A IS WINNING YOU REPUBLICLONE THUGS ARE GOING TO LOSE” (And in fairness, you’re just as likely to find a similar all-cap post about “al-qaeda-loving DhimmocRATS” going down), can quickly be shown up with a reference to error margins (btw Caroline, I’m no longer pointing fingers at you, you were 100% correct in the criticism of DaveG as I quoted it; I just used your post as a jumping-off point).
At a simple level, one thing most people don’t understand is that the error margins apply per data point. Not per spread. In other words, in every Obama/McCain poll, there is an error margin for Obama, and an error margin for McCain. With a 3.5% error margin, then, a poll showing the two tied could mean that McCain is ahead by 7 points or behind by seven. A poll showing Obama up four is still within the error margin. A poll showing McCain up by 7 could mean the two are tied, or it could mean that McCain is up by 14.
Most people get this, or pick up on it quickly.
But I am still being imprecise, and this is an important nuance that very few people get. My paternal grandfather, he of the tenth grade education (above the average for my grandparents) but of overwhelming common sense, has said when talking about my hobby “how is it possible to say anything with certainty about how millions of people will vote based on what 500 people say in a poll?” (actually there’s usually a lot more adjectives thrown in, but we won’t go there).
And it is an important point. The answer is that you can never say with 100% certainty, based on a poll, that X or Y is ahead. It is possible that, in a state with 5,000,000, of which 4,999,749 are going to vote for Obama, you could get a poll of 500 voters that includes all of the 251 McCain voters, and end up with a poll predicting a McCain win. Even worse, you *could* do a poll of 250 voters, and end up picking all the McCain voters, and call it as an overwhelming McCain victory. It just ain’t very likely.
But this is the important thing about horserace analysis. Most declarations based on a poll that “X” or “Y” is ahead will drop an important caveat: With “x%” certainty.
Pollsters use 95% confidence as an “industry standard,” which they do for some very specific, technical reasons I won’t go into here. So with a sample size of, let’s say, 500, it is correct to say that you can be 95% certain that the “true” outcome is within 4.4% either way of your polling outcome. So if Obama comes in at 50%, and McCain comes in at 42%, you can’t be 95% certain that Obama is leading..
But 95% confidence isn’t the be-all, end-all. Like I said, it is the industry standard that is selected for specific reasons, that may or may not always apply to your needs. Sometimes, for example, you may have an extremely important survey to make. Let’s say you’re thinking of going to intrade and plunking down your life’s savings on Obama to win. At that point, you may decide that you need to be 99% confident in your given range. Other times, you might want to be less sure.
What if, for your purposes, you only wanted to be 90% certain Obama was leading? Your error margin shrinks to 3.68%. In other words, in the preceding example, we *can* be 90% certain Obama is leading. And 90% certain is pretty darned good!
But what if you wanted to be REALLY certain, like 99% certain. With a 500 voter sample, we’d have to have a spread of 11% before we can be THAT certain.
Now you’ll notice something here. The error margin for 90% is +/-3.68%. The error margin for 95% is +/-4.38%. And the error margin for 99% is +/-5.76.
The relationship is not linear here. To go from 90% certainty to 95% certainty, your error margin goes up .7%. To go from 95% to 99%, your error margin goes up 1.4% (with sample sizes of 500).
In other words, if you are willing to accept a lower degree of certainty, you can often draw inferences even based on poll results that are within the reported error margin (which is almost always based upon 95% certainty), especially if you’re close to being outside the MOE, without sacrificing that much certainty.
For example, let’s take a look at the recent poll from SUSA showing Obama up 3 on McCain in Ohio. It has an error margin of +/- 4.3%, meaning that in common parlance, it is a “statistical dead heat.”
But its not really. The poll sampled 542 registered voters, and showed them three points apart. If all we wanted to know was whether it was “more likely than not” that Obama lead McCain, we could say “yes,” because we are 55% certain that Obama’s “true” score lies between 46.6% and 48.4% and that McCain’s lies between 46.4% and 42.6%. In other words, we are 55% certain that Obama is leading in Ohio.
Let’s say that Obama is *four* points ahead. Under those circumstances, we can be 65% certain that he is actually leading. A six point lead means we’re 88% sure he’s leading.
In other words, you can still draw inferences from polls within the *published* error margin, some of which are quite useful. For my purposes, if I see a lead of greater than 4 points in a poll with a sample size of 600, I’m willing to say candidate A probably does have the lead. This becomes especially useful when you have the really good polls with large sample sizes; for example the Gallup tracking poll has 1200 participants in a given sample; with that we can be 66% sure that a three-point lead is a real one. This also is important when you have a large series of polls; the midpoint of the polls will tend to be the actual result since the polls will generally be distributed evenly around the actual result.
Now, all of this assumes polls use proper methodology that doesn’t bias the result, and it assumes a “normal” distribution of the populace, which isn’t really the case (eg, it assumes that the populace is spread out evenly like a giant bag of M&Ms, rather than segregated like a Snicker’s bar. Mmmmmmm…Snickers bars). Still, there are ways to correct for that, though that is a subject for another post.
February 22nd, 2008 at 4:50 pm
Hello, Sean-
Even by the rarefied standards in this forum, that is an outstanding post - one of the best I’ve ever seen here. Major props… (and hat tip to Caroline for sparking your decision to write it…)
February 22nd, 2008 at 4:55 pm
Excellent post.
Thank you for your contributions here.
February 22nd, 2008 at 5:14 pm
[...] many fine posters, among those at Race42008.com, is Sean Oxendine. His post on polling - On Error Margins - explains the margin-of-error concept in lay terms for those of us who enjoy the horse-race [...]
February 22nd, 2008 at 5:18 pm
great post!
February 22nd, 2008 at 5:42 pm
THANK YOU for pointing some of these things out. People in the blogosphere are incredibly ignorant as to how polling works, what data mean, how to read statistics, what to logically infer from polling…
February 22nd, 2008 at 5:42 pm
This, especially, is important:
At a simple level, one thing most people don’t understand is that the error margins apply per data point. Not per spread. In other words, in every Obama/McCain poll, there is an error margin for Obama, and an error margin for McCain. With a 3.5% error margin, then, a poll showing the two tied could mean that McCain is ahead by 7 points or behind by seven. A poll showing Obama up four is still within the error margin. A poll showing McCain up by 7 could mean the two are tied, or it could mean that McCain is up by 14.
People just don’t seem to realize that.
February 22nd, 2008 at 6:19 pm
Thanks, Sean — excellent, as always. I think you did one like this in 2004, but the reminder is valuable.
“This also is important when you have a large series of polls; the midpoint of the polls will tend to be the actual result”
This is why I like the RCP average, which has proven pretty accurate in my experience.
February 22nd, 2008 at 10:11 pm
great post Sean.
I had been thinking about this very issue recently - how all MOEs are based on a 95% confidence - and wondering how the MOEs would change with the acceptance of different confidences.
Is there a relatively simple formula, or table handy for this?
February 22nd, 2008 at 11:39 pm
Tano here’s a few resources:
http://www.dimensionresearch.com/resources/calculators/conf_means.html
February 22nd, 2008 at 11:39 pm
Here’s one for proportions (better for political surveys)
http://www.dimensionresearch.com/resources/calculators/conf_prop.html
February 22nd, 2008 at 11:41 pm
Here’s one that’s useful for comparing different surveys.
http://www.stat.tamu.edu/~jhardin/applets/signed/case11.html
February 23rd, 2008 at 12:20 am
egs, thanx
February 23rd, 2008 at 12:51 am
Tano,
This is the one I use. Pretty user-friendly.
http://www.raosoft.com/samplesize.html
February 23rd, 2008 at 11:18 am
This was a very good posting. I had almost commented on this at various times, but had chosen not to because of the significant amount of time it would take to explain it all as well as you did. You covered every important aspect of what needed to be covered. Good job!
February 24th, 2008 at 5:01 pm
I’m surprised no one has pointed this out yet. There is also the assumption that the error is Gaussian and unbiased, which is often far from the truth. This is why you have a poll with a ~3% margin of error being off the mark by 15%.