September 21, 2008

A Response To Nate Silver

Someone in the comments was kind enough to point out that Nate Silver has penned a little ditty over at FiveThirtyEight.com entitled “Bad Math and the Bradley Effect”, which purports to be a response to my earlier article speculating on what the scenario is for McCain to win. Ideally I’d spend my Sunday early afternoon doing something more productive, but since he’s basically challenging my intellectual honesty by accusing me of cherry-picking polls, I suppose that it deserves a response.

Silver’s biggest complaint with my methodology seems to be that I define my dataset as something less than every election that was held during the primary season. Of course, one of the biggest challenges of any type of statistical analysis is defining your dataset. I did not go into my reasoning for my dataset selections, since (a) I was trying to explain the scenario (which I considered unlikely) for McCain winning, not explore under strict adherence to political science principles whether a given effect occurred or not and (b) it is a blog post, not an academic paper. But since it has been brought up, the reasoning was as follows.

I excluded caucus states, such as Iowa and Nevada. I think anyone with a basic understanding of the dynamic of caucuses would understand my decision to do this, and would know that it had nothing to do with whether or not the results in those states fit any particular hypothesis. They were excluded because they are relatively low turnout affairs (Iowa being something of an exception to this) with Byzantine voting rules where it is almost impossible to know the true first-choice preferences of everyone who turns out to vote. These are rules that zero states will follow in November. And the caucus states require attendees to have all-day availability, which means only a certain type of person can attend. Because of this, they are notoriously difficult to poll, and it is more likely that the pollster error is just due to a bad turnout model than it is anything else. This is a reasonable choice (as Silver actually seems to concede).

And while it is always good to have as many datapoints as possible, in my estimation, including caucus states meant that I would be comparing apples with oranges, which is actually the last thing you want to do when hypothesis-testing.

Silver knows this, given that in his recent piece on cellphones, he excludes a number of pollsters from his dataset, for a variety of (perfectly justifiable) reasons. While it would have given him more datapoints if he had included pollsters who only recently began polling or who conducted internet polling, Silver correctly decided that including them would damage the integrity of his data. It’s the same process with excluding caucus states.

Next, I excluded Florida because it was a part of the Old Confederacy (see below) and also because, as we were reminded again and again in the primaries by Obama supporters, neither candidate campaigned there. It is therefore difficult to use its results as indicative of how the candidates would have fared in a full-on election.

Finally, I excluded the results from the Old Confederacy, but left in Texas. My reasoning for excluding these states is simple, and was (perhaps too succinctly) summarized in the following phrase in my post, where I described the South as a region “where [Obama] was buoyed by unusually high African American turnout…” In other words, the makeup of the electorate in those states is fundamentally dissimilar from the other states in the dataset. African Americans made up 48% of the electorate in Alabama, 49% in Georgia, 47% in Louisiana, 48% in Mississippi, 34% in North Carolina, 55% in South Carolina, 29% in Tennessee, 19% in Texas, and 29% in Virginia. This is unique to the Confederacy: The only other states where African American turnout exceeded 20% of the electorate were Delaware (29%), Illinois (21%), New Jersey (23%) and Maryland (37%).

And it is pretty clear that the African American population has a significant effect on whether Obama over- or under-performed in the polls- – the r-square when you compare AA% in a state to the pro- or anti-Obama effect in the state is .45, with a t-stat of 4.6 for the variable (which seems to argue for the existence of such an effect). Since the goal is to figure out how white voting behavior will change on election day relative to the polls, if at all, it seemed to argue for excluding states where whites make up an unusually small portion of the electorate. I guess another approach in states like NC and SC would be to compare the proportion of white voters Obama was predicted to get by particular pollsters versus what exit polls showed, but this has its own problems. And I’m lazy.

Silver argues that “the particular geographics [sic] of the Confederacy are not especially relevant electorally.” In many contexts that may be true, but in this context it is not. Having African Americans comprise around 50% of the electorate – something, incidentally, that many pollsters weren’t predicting, especially at first – would drown out any Bradley effect in a way that wouldn’t occur in other states where African Americans comprise a much smaller portion of the electorate. Moreover, the South behaved differently than the rest of the country in the primary season. You could explain 80% of the variance – at the county level! – of the voting in the South between Hillary and Obama solely by looking at the percentage of African American and college-educated voters in the county. That is unique to the South, and did not hold up elsewhere. Finally, there aren’t any states in the country that will have African Americans making up 40-50% of the electorate, except maybe Mississippi.

I’ll also add that Silver’s assertion that Kentucky and Tennessee are two peas in a pod is just silly. Tennessee has no analogue to the Old Seventh congressional district in Kentucky, which is mining country organized by the UMW in the 30s, and which is basically an extension of West Virginia. Kentucky has no good analogue to Memphis, and its Fifth District is only a much smaller and weaker analogue to Tennessee’s First, Second, and Third Districts. And more importantly, the African American percentage in the Kentucky Democratic primary electorate was 9%, versus 29% in Tennessee. For whatever similarities they might have, their Democratic primary electorates are very dissimilar in the way that is most germane to this model.

The decision to re-include Texas is probably the best criticism of my methodology, but it is also the one with the least overall effect, given that the Obama barely underperformed there. The reason for including it is pretty obvious if you look at the statistics above, and consider my overall reasoning for excluding the Confederacy. The AA population is comparatively small relative to the remaining Southern states, and is more akin to the general Democratic electorate. Perhaps it would have been simpler to say “exclude all states where African Americans comprised over 25% of the voting electorate,” which would have had the same effect, although Maryland would have been excluded, and would have been more consistently applied. The only problem is that any percentage applied would have been probably even more arbitrary than the methodology I chose – why not 20%, which would have excluded New Jersey? Why not 30%, which would have included TN and VA? Regardless, if it makes people feel better, we can still exclude Texas, which changes my results a couple hundredths of a point.

I’m not 100% certain, and am genuinely curious, how Silver is calculating the confidence intervals for my results, so I can’t really respond to the statistical significance charge. I must admit, however, that I find it odd for Silver to chide me for not reaching the 90th percentile in statistical significance, given that his data dredging…er…stepwise regression process demands significance only at the 85th percentile (just do cntrl-F and enter 85 to find it). The use of RCP averages versus pollster.com is much easier to defend. As you know from watching the primary results, Obama surged in late January after his South Carolina win. I’m no expert in Pollster.com’s methodology, but my impression from looking at some of their results is that they are much slower to phase out old polls than was RCP. Indeed looking at the FAQ, I’m not certain they phase out old polls at all. In some ways and in some applications, their estimates are superior to RCP’s, but in a race where you have a last minute surge by a candidate, an average that only includes the last few days’ polling is going to be the best estimate to use.

There are several examples to point to of how this affects the results, but perhaps California is the best one. RCP’s final average for California included polls concluded only four days before the primary, which fully captured the poll bounce Obama was seeing. This is demonstrated in their chart of the race. The Pollster.com polling shows a much more gradual improvement in Obama’s polling, in part (I think) because the relatively recent, but nonetheless outdated polling from a few weeks prior was dragging their analysis down. In other words, my sense was that, especially in the Super Tuesday polls, the Pollster.com method was understating Obama’s strength in the polls, and hence overstating the degree to which he overperformed.

Indeed, if you look at Silver’s chart, which is organized sequentially, the numbers become an awful lot redder (indicating Obama underperforming) when you get past Super Tuesday, and there is a lot less difference between his findings regarding Obama’s performance and mine. Take the following chart, which shows my result (a negative value means Obama underperformed), 538’s results, and then the difference between the two. I’ve highlighted any state where I found a pro-Obama effect at least two points higher than Silver’s in blue, and any state where I found an anti-Obama effect at least two points higher than Silver’s in red. The Super Tuesday states are between AL and TN. Super Tuesday is really where the Silver and I find different results, after that we are rarely more than a couple points off in our results. I think this is almost entirely due to the last-minute poll bounce that RCP captured, and which Pollster didn’t.

State Oxendine Result 538 Result Difference
NH -10.9 -9.8 -1.1
SC +17.3 +14.3 +3
AL +9.2 +15.6 -6.4
AZ -2.8 -.3 -2.5
CA -10.8 -2.3 -8.5
CT +1 +5.8 -4.8
GA +17.3 +21.4 -4.1
IL -1.5 -4.1 -2.6
MA -8.4 -4.2 -4.2
MO +7 +2.4 +4.6
NJ -2.1 +.1 -2.2
NY -.3 +2.5 -2.8
TN -.3 +8.7 -9
MD +1.2 +4.7 -3.5
VA +10.5 +6.2 +4.3
WI +13.1 +10.3 +2.8
OH -3 -2.7 -.3
TX -1.8 -1.6 -.2
RI -13 -7.7 -5.3
VT -2 -.6 -1.4
MS +8.6 +9.1 -.5
PA -3.1 -1.7 -1.4
IN +3.6 +3.1 +.5
NC +6.7 +7.2 -.5
WV -6.3 -4.3 -2
KY -6.6 -.4 -6.2
OR +5.6 +5.5 +.1

The criticism that the results aren’t robust if they change when the averaging mechanism is changed is also silly. If one averaging mechanism somehow systemically biases the results relative to other averaging mechanisms, which I think is the case here, then of course which one you choose makes a difference, and it should make a difference. This is especially true if the averaging mechanisms are interpreting different data, which I also suspect is the case here.

In the end, I suppose reasonable minds can differ over whether to use Pollster.com or realclearpolitics.com. Without really knowing how Pollster’s regression works, it is probably impossible to argue it conclusively. But I really thought Silver’s response to using RCP rather than Pollster was silly. It tipped me off that his principal interest is in NOT finding the Bradley Effect, rather than letting the chips fall where they may. If he finds no Bradley effect and I find a Bradley effect, there is evidence for the Bradley effect. That doesn’t mean that it is there or not – and remember, the real point of the article was to speculate on what would have to happen for McCain to win, not to prove or disprove anything — all it means is that more research must be done.

Finally, Silver writes:

The other, more important question is why we should simply dismiss the results in the South, where Obama significantly overperformed his numbers, by 7.2 points on average, according to my definition of the region and by 9.9 points according to his…

The easy answer is that I don’t dismiss it. Had Silver bothered to read the entire article, he would see that I wrote, under the heading “Youth/African American Vote”:

Regardless, I’ve covered this to some extent here, with the salient point being this: yes, Obama will likely increase African American turnout, but the states where this could make a real difference – with the exception of Virginia – are either so deeply red or so deeply blue that it is unlikely that AAs will be game-changers. Improving Obama’s vote share in Mississippi from 40% to 43% doesn’t do him a lick of good.

Emphasis added. Silver and I seemingly agree that any pro-Obama effect from high African American turnout is likely to be muted in the general election, with AA’s share of the electorate likely to be at least halved relative to the general election. Higher African American turnout might absolutely flip Virginia if McCain is only leading by a couple of points on election day, as he is today, and as I concede in my article. It might make a difference in North Carolina, but given that McCain is, as of this writing, leading by nine in the RCP average and five in North Carolina, I’m not sure it will happen. I don’t think it will have much of an effect in Florida either, where African Americans make up an even smaller portion of the population than they made up in Texas: only 12% in 2004.

But as of this writing, McCain (using Pollster.com) is down three in PA, and down three in MI, and up two in OH. Even Silver’s own averages have McCain up five in NC, though they only have him up one in Florida (notwithstanding that only one poll this month has Obama leading, and that only three polls this month have McCain’s margin at less than five, but the shortcomings of his model are a story for another time). This all may change by election day – heck, it might change this week – but as of right now, I would take a couple extra points for McCain in PA, MI, and OH in exchange for giving up a couple of points in VA, NC, and FL. In a heartbeat, without thinking twice.

And now I’m going to go play with my kid.

by @ 3:56 pm. Filed under Poll Watch
Trackback URL for this post:
http://race42008.com/2008/09/21/a-response-to-nate-silver/trackback/

15 Responses to “A Response To Nate Silver”

  1. NDak Guy Says:

    Hey, Sean, don’t sweat it. Silver is a bit overly impressed with himself and is convinced that his methodolgies are as superior to anyone else’s as most liberals think their intellect is superior to that of conservatives. What seems to be so is that he slightly discounts polls that do not reconcile with how he sees the race. At this writing he has Obama leading in OH, VA, NV and CO, even though there are a greater number of polls that lead to different conclusions. He also has MI, PA and WI as medium blue, which several other analysts have as much closer.

    I don’t know of any other analyst that is presently calling for a 300+ EV tally for Obama. Not to say that cannot happen, but there are few polls that show that at this point. Could be that Silver is seeing something (or smoking something) not available to anyone else.

  2. frank (#2) Says:

    Well done.

    Nate does change things in the middle of trends to help Obama out in his model. I love his elaborate explanation of convention bounces and then changed his implementation of that model because McCain was doing too well. Then McCain goes ahead and he “tweaks” his model to more fully support his position. Constant tweaking to improve Obama’s stance. It was amazing that McCain was up by 5 or so in gallup and he received about 285 EV in his model and now Obama is up only 4 and he receives 306 EV. Fascinating.

    Again, great job.

  3. Pittsburgh Kid Says:

    In my opinion, fivethirtyeight.com, varies between being neutral and pro-Obama. At least Nate admits his bias. I found his site more useful in the primaries. Now it is only as good as the other main political sites out there (RCP, Pollster, race42008)

  4. dotan Says:

    And now I’m going to go play with my kid.

    Kids rock.

  5. BarkTwiggs Says:

    Good riposte against Silver man. I am a big fan of his simulation methods since I’ve done some similar boot-strapping methods in my school work, but it has it’s limitations. His methodology is also really good for pin-pointing battleground states which could make or break the election. However, his results have had a lot of variance over the last few weeks (probably due to the rapidly changing environment) but they have no truly accurate predictive value (like any poll).

  6. NDak Guy Says:

    PK, Nate Silver is, by his own admission, pro-Obama. I think he probably does *try* to be neutral in his analyses, but his wishes sometimes cloud his judgements. Given that many of the polls in the last few election cycles predicted stronger Democratic results than the actual election results, I would think that any analyst would be very cautious to factor that underperformance into their model.

  7. Aron Goldman Says:

    Higher African American turnout might…make a difference in North Carolina, but given that McCain is, as of this writing, leading by nine in the RCP average and five in North Carolina, I’m not sure it will happen.

    In looking at four of the most recent polls that comprise the current RCP average for NC (McCain +8.2), I noticed that the variance in the breakdown by party ID, especially on the Democratic side (with the exception of the SurveyUSA outlier), is so significant that it renders the value of RCP’s composite in Carolina relatively worthless.

    Here are the results:

    PPP: McCain 46%; Obama 46% (9/17-19)
    49% (D); 35% (R); 16% (I) – Differential (D +14)

    Research 2000: McCain 55%; Obama 38% (9/8-10)
    44% (D); 35% (R); 21% (I) – Differential (D +9)

    Civitas/TelOpinion: McCain 47%; Obama 44% (9/6-8)
    47% (D); 34% (R); 19% (I) – Differential (D +13)

    SurveyUSA: McCain 58%; Obama 38% (9/6-8)
    40% (D); 41% (R); 16% (I) – Differential (R +1)

    My guess is that the Dems presently possess an 11-point advantage (~46-35%) over the GOP, which, if accurate, would put McCain ahead of Obama by ~5 percentage points.

  8. JA Pruce Says:

    Although I regard the Palin pick as brilliant and see her as the overwhelming favorite in 2012 should McCain lose, you still have to wonder whether or not McCain would have NH, MI, and CO firmly in his column if Mitt were on the ticket.

  9. JA Pruce Says:

    Sorry- off topic…

    But this is infuriating – McCain is not only calling for SEC chairman Cox (a fine fiscal Conservative) to be fired, he is saying that he would appoint liberal, Andrew Cuomo to the post.

    http://blogs.wsj.com/washwire/2008/09/21/mccain-calls-for-cuomo-at-sec/

    What is going on here? Am I missing something?

  10. Middle Snu Says:

    It’s entirely awesome to see you two dueling over statistical analysis: the sort of thing that we need, instead of more silly partisan rhetoric.

    Sean, you’re my hero. And I mean that semi-sincerely.

  11. joe Says:

    you told him?

  12. eric Says:

    “And I’m lazy”

    What does that make me?!

  13. eric Says:

    Also-the link to Nate Silver’s writing doesn’t work at this moment. Either he has conceded or the link is just bad.

  14. Evil Conservative Says:

    For all of you MMA/UFC fight fans of Wanderlei “The Ax Murderer” Silva…
    Sean is “The Ox Murderer”

  15. Adam Says:

    Sean Oxendine,

    Get a load of this post.

    h++p://www.fivethirtyeight.com/2008/09/allocating-undecideds.html#comments

    Nate Silver thinks that 64 percent of undecideds in MS will vote for Obama. I’d love to see you refute that. You’re right. The guy has jumped the shark. It’s not even a close call.

The Candidates





























Featured Archives


Race 4 2008 Interviews

Recent Posts

Categories

Archives

Search

Blogroll

Facebook


Join Race 4 2008 on Facebook

Site Syndication

Twitter

Main

Meta Data

Design and Hosting By