Prediction Is Hard

Dave Schuler August 27, 2012

In a previous post I mentioned something to the effect that econometric models with which I was familiar were predicting a close November presidential election. Nate Silver corrects me:

I tracked down about every one of these models that I could find, subject to the condition that it couched its forecast in probabilistic terms (or that it was well-documented enough to allow this to be inferred with relative ease, like from the standard error that the model stated).

In my view, itâ€™s in estimating the uncertainty in a forecast where most of the challenge and intrigue lies. To paraphrase Charles Barkley, any knucklehead can make a point prediction â€” but it takes brains to calculate a confidence interval.

These models are all over the map, forecasting everything from a nearly certain Obama victory to the substantial likelihood of his defeat. But more of them have Mr. Obama as the favorite. If you simply average their win probability estimates together, you get about a 61 percent likelihood of his winning the election.

As it turns out the models with which I was familiar (Hibbs, Cuzan, Fair, Klamer) have something in common: they all predict a closer outcome than the other models in Nate’s list. Lest you suspect some sort of partisan or ideological bias Yale’s Ray Fair is no Republican. His model is an econometric model and the economic fundamentals aren’t working the president’s favor.

I tracked down some of the other models Nate uncovered which, unfortunately, he doesn’t explore in the cited post to any great degree. They, too, have something in common: they’re not technically econometric models; rather they’re models fundamentally based on opinion polls.

Helmut Norpoth’s Primary Model is conceptually very simple. Basically, it says that presidents who don’t face primary challenges get re-elected. It seems to me that this approach would tend to miss wave elections, understandable since real, major wave elections are relatively rare in the United States.

Here’s Sam Wang’s model:

My approach is to take a well-designed statistical snapshot of all the polls. A snapshot of the last 2-3 weeks of polls was 364 electoral votes (EV) for Obama, within 1 EV of the final outcome, 365 EV. Even a one-week snapshot was 353 EV, within 12 EV. Either number came closer than the other sites. Score one for meta-analysis!

Drew Linzer’s model is:

a dynamic Bayesian forecasting model that unies the regression-based historical forecasting approach developed in political science and economics with the poll-tracking capabilities made feasible by the recent upsurge in state-level opinion polling.

and he finds that:

Contrary to much of the media commentary at the time, Obama’s victory was highly predictable many months in advance of the election.

I’m sticking to my guns. I think the popular vote, whichever way it goes, will be quite close, closer than the 2008 election, and that so much depends on factors that are beyond even sophisticated prediction models that it’s just too close to call. Will a major disaster occur in Afghanistan just before the election? Will we be at war with Iran between now and then? Will the stock market experience a major correction or crash between now and then?

That very much along the lines of what Sean Trende of RealClearPolitics observes:

In 1980, Jimmy Carter didn’t have an argument for re-election that appealed very far beyond the Democratic base. Similarly, in 1984, Walter Mondale simply didn’t have much of an argument for getting rid of Ronald Reagan. The Republicans didn’t have a good argument for holding on to power in 2006, nor did the Democrats in 2010. The elections reflect that.

This year, Barack Obama has an argument — he didn’t inherit the mess, and the economy is slowly expanding. That’s an argument that is probably good enough to get him to 46 or 47 percent of the vote. Similarly, Mitt Romney has a pretty good argument for electing a new president, one that will shore up his base and Republican-leaning independents. Thus, we should probably expect what we’re presently seeing in the polls: a close race, to be decided by a relatively small slice of the electorate.

21 comments… add one

Icepick Link

I just don’t see how a mature “recovery” that is worse than most of the post-WWII recessions can possibly allow for re-election of the man who keeps telling us everything is going gang-buster super-awesome best economy ever. I’m not much of a fan of Investor’s Business Daily but they summarize some of the stats nicely:

Real median household income has fallen 4.8% since the so-called recovery officially began in June 2009. That’s a steeper decline than occurred during the recession itself, when incomes dropped 2.6%, according to a new report from Sentier Research.

Almost every demographic group has seen incomes drop during the alleged Obama recovery, Sentier found. Even those who report being continuously employed watched their real incomes drop nearly 5% over the past three years.

…

Meanwhile, there are 800,000 more long-term unemployed than when the “recovery” started, and the ranks of those who aren’t in the labor force at all have swelled by nearly 8 million.

The unemployment rate remains stuck above 8%, and shows no sign of coming down any time soon. And the dreaded misery index â€” which combines inflation and unemployment â€” is 20% higher than it was three years ago.

Note the time frames – they’re talking about the RECOVERY, not the RECESSION. The only reason the UE-3 rate isn’t substantially higher is because 8 million of us have been forced off the roles entirely. How does an 11% UE-3 rate (which is about what it ought to be) translate into a close election for the guy that likes to brag about what a great job he’s done?
Dave Schuler Link

A couple of things. First, the election is won by electoral votes and electoral votes are determined, by and large, by whole states. The margin of victory isn’t all that important. The question in this regard is which states won by Obama in 2008 is Romney likely to carry in 2012?

I don’t believe that Romney will turn California, New York, New Jersey, Illinois, Oregon, Washington, Michigan, Minnesota, or any of the New England states into his column. Affiliation. My “ears to the ground” suggest that it’s likely he’ll carry Indiana and possible that he’ll carry Missouri, North Carolina, and even Iowa.

That leaves Florida for which you are likely to have a better feel than I, Ohio, Pennsylvania, Virginia, Nevada, and Colorado.

Second, essentially, what I’m saying is that whoever wins will win with less than 53% of popular vote. If Romney wins, that would mean a relatively huge swing of 7% from McCain’s results.
Icepick Link

As best I can tell I expect a low turnout in Florida. The number of signs, bumper-stickers and so on is WAY down form 2008. I live in a black part of town, and there’s little enthusiasm for Obama, much less than last time around when they had stands up selling unauthorized Obama merch. I spend a fair amount of time in “redneck” parts of town and I see little enthusiasm for Romney. Few of those with whom I speak are enthusiastic about either candidate, but there’s a lot of passion amongst the unemployed to vote Obama OUT. They don’t necessarily think Romney is going to be any great shakes, but there’s a firm belief that Obama is a big, active negative.

The passion this time is almost entirely on the “Against” side. I don’t see how that helps the incumbent.
Dave Schuler Link

As I see it Obama will get 90% or more of the black vote, 70% or more of the Hispanic vote, and 80% or more of the Jewish vote. I don’t think that’s shakeable and it means that he starts out with at least 25 points. The only thing that would bring that into question is turnout.

I might add that I strongly suspect that Florida is one of the few states in which vote fraud is dispositive. There are enough snowbirds that if a large enough percentage vote twice it could swing elections.
PD Shaw Link

What I find interesting about Silver’s model, is that while its been consistently showing an Obama victory with relative confidence, most of the states highlighted on his page are within his margin of error. Specifically, these Democratic states: CO, IA, NH, NV, OH, VA, & WI and these Republican states: FL & NC. That looks to me more like a tie than anything definitive.
Dave Schuler Link

Sort of my point, PD. If all of the margin of error states break for Romney, Romney wins. If all of the margin of error states break for Obama, Obama wins. If they split between the two candidates, it’s too close to call. The assumptions matter and I think that Nate assumes that margin of error states break towards the incumbent. That might have been true a decade ago but I don’t think it’s true now.

I might add that if you subtract the electoral votes from states that Obama won in 2008 by less than the national average from his vote total he loses. In other words the 2008 election was no landslide.
Icepick Link

I might add that I strongly suspect that Florida is one of the few states in which vote fraud is dispositive. There are enough snowbirds that if a large enough percentage vote twice it could swing elections.

The snowbirds commit fraud every time around, but it still only effects the vote only so much. (In 2000 it clearly impacted the margin.)
Icepick Link

Let me clarify my last comment. In presidential election years, Florida is a purple state that leans slightly red. If the election tends towards pure purple the snowbirds can swing the election blue – 2000 was really close in that regard. But if the state trends towards either color, the snowbirds don’t matter. So in 2000, the snowbirds mattered but couldn’t quite carry the day. In 2004 and 2008 the election trended towards one side or the other and the snowbirds only impacted the margins of victory.

In non-presidential election years Florida is solidly red, so snowbirds can vote blue all they want, it rarely matters. (We’ve got a Dem Senator mostly by accident at the moment. Nelson has been in Congress or the Senate for a long time, and there are enough old Democrats that they will keep voting for him for the old Southern reason – seniority matters.)

…

Incidentally, in 2004 I knew Bush had the election sown up months before the election. It was impossible to believe that Bush wouldn’t carry Florida that year, just from driving around and seeing all the economic activity going on.

Looking at old posts of mine from 2008 it looks like I called Florida for Obama fairly early in October, based on McCain’s weakness and the failing economy. (I also did lots of counting of yard signs that year, and noted that McCain looked to be losing in Orange County. I’m at Ground Zero in the swing portion of the swing state.)

This year, I just don’t see any passion at all for voting for Obama. Here’s what I wrote about McCain voters in 2008:

When McCain voters (I hesitate to call most of them fans or supporters) try to convince me to vote for McCain their arguments take on the “Yes …, but ….” form. (Buffy fans would call these but-faced arguments.) “Yes McCain’s stance on campaign finance is bad, but Obama’s support of ________ is much worse.” “Yes McCain favors government solutions too often, but Obama is practically a socialist.” “Yes McCain compromises too much, but imagine Obama working in concert with Pelosi, Reid, et al.” Etc. It may not help their case, but it has the charm of acknowledging legitimate problems exist with McCain’s candidacy.

The Obama supporters sound like that now. “Yes the economy sucks now, but … BUSH!” “Yes Obama has been overly supportive of the Big Banks and Wall Street, but Romney practically IS Wall Street!” “Yes Biden is an idiot, but … but …. uteri!”

“Yes …, but …. ” arguments aren’t terribly convincing for an incumbent.
Steve Verdon Link

I tracked down about every one of these models that I could find, subject to the condition that it couched its forecast in probabilistic terms (or that it was well-documented enough to allow this to be inferred with relative ease, like from the standard error that the model stated).

In my view, itâ€™s in estimating the uncertainty in a forecast where most of the challenge and intrigue lies. To paraphrase Charles Barkley, any knucklehead can make a point prediction â€” but it takes brains to calculate a confidence interval.

I have a problem with this…a serious problem. Confidence intervals are not probabilistic except in the most trivial sense. A confidence interval either contains the parameter of interest (probability 1) or it does not (probability 0). To try and come up with probabilities from confidence intervals is a huge methodological boo boo. It suggests to me the person making such an inference really does not know what they are talking about.

And after reading Hibbs paper I stand by that Silver is talking out of his ass. Hibbs’ paper estimates a model where the dependent variable is vote share, not probability of being re-elected.

Now Drew Linzer’s model would allow you to make probabilistic statements since it is couched in the Bayesian methodology. Hibbs on the other hand is not (well, we probably could re-cast Hibbs in a Bayesian light, but the interpretations would likely change as well). The point here is that Bayesian is not the same as Classical/Frequentist. Making the assumption that the two results are comparable also indicates ignorance.

Contrary to much of the media commentary at the time, Obamaâ€™s victory was highly predictable many months in advance of the election.

I believe this quote applies to the 2008 election, not the current election. Which makes some sense since the date on the paper is May 2012 and he notes it is ready to be used for the 2012 election. Also, note that as a Bayesian model it will become more accurate and useful as a predictor the closer you get to the election–you’ll have more data with which to update your prior probabilities. I don’t see anywhere in that paper that suggests Linzer has used it for 2012.

Note none of what I wrote should be interpreted as indicating that either Obama or Romney will win. I’m just pointing to some of the technical issues involved here…some of which are rather nuanced, mathematically speaking.
Steve Verdon Link

BTW, I’m wondering where Silver got his Linzer estimates. I’ve looked over Linzer’s papers, nothing. I’ve looked through a number of posts on his blog, again nothing. Is Silver using Linzer’s numbers from 2008? If that is the case, the the 2% chance is for a McCain win….d’oh!!!
TastyBits Link

@Steve Verdon

… Iâ€™m just pointing to some of the technical issues involved hereâ€¦some of which are rather nuanced, mathematically speaking.

I think this is an enormous problem for many prediction models in all fields. This also is one of the reasons @Drew is correct about a Statistics class being required.
Icepick Link

TB, do you really think someone that can’t pass Algebra I in high school is going to learn enough from a stats class to understand “The point here is that Bayesian is not the same as Classical/Frequentist.”
Steve Verdon Link

TB, do you really think someone that canâ€™t pass Algebra I in high school is going to learn enough from a stats class to understand â€œThe point here is that Bayesian is not the same as Classical/Frequentist.â€

I’m not TB and not speaking for him, but for myself….absolutely not. The notion of something like a confidence interval often trips up people who not only passed high school algebra, but also managed to do quite well in college level calculus.
PD Shaw Link

Nate Silver’s background is sabermetrics, and he relies a lot on a “regression to norm” concept used in baseball a lot. In sports, the idea is that a player has accumulated a large body of data over his career and that _barring other explanation_, a player considerably over- or under-performing will regress to his mean over time.

He uses a regression called “State Fundamentals” to further modify his adjusting polling averages by state historic voting patterns, state demographics and state economy. I’m skeptical about this, but I think it allows him to paper over limited or bad polling. If Obama is over-performing in Alaska, “State Fundamentals” allows him to adjust the polls to reasonable expectations under the theory that the polls will regress to the norm by November. But at some point the polls are the polls, and we have a very small sample size of presidential elections involving these two candidates.

On the economics, should economic conditions be considered statewide or nationwide? I’ve read several places that Iowa’s economy is in relatively good shape, so Obama should win there. That seems counterintuitive to me. Iowa’s economic strength is in agri-business and related fields, which I don’t think Obama helped or hurt; they certainly won’t credit Obama with much. Further, economic security might just as easily make Iowa voters more responsible to budget or social issues. Are people in the Detroit area going to punish Obama for a bad local economy, or will it make them more likely to support the more liberal politician?
Dave Schuler Link

Iowaâ€™s economic strength is in agri-business and related fields, which I donâ€™t think Obama helped or hurt; they certainly wonâ€™t credit Obama with much. Further, economic security might just as easily make Iowa voters more responsible to budget or social issues. Are people in the Detroit area going to punish Obama for a bad local economy, or will it make them more likely to support the more liberal politician?

With respect to Iowa, the effects that the failure of the corn crop will have are already being felt there. Can you blame Obama for it? No. But it does add to a general feeling of dissatisfaction .

With respect to Detroit, the vote will go largely to Obama on affiliational grounds. There have been any number of studies that have found that people routinely don’t vote their interests but vote on any number of other grounds including affiliational ones.

I think that in the particular case of Obama (although not in the particular case of Detroit) a significant number of people vote for him not because of what he does or will do but because of how it makes them feel about themselves.
PD Shaw Link

On corn, it all depends on whose ox gets gored. The corn farmers, particularly if they have crop insurance, will be fine, because the price of corn is rising quicker than the crop losses.

At the start of the crop season, U.S. corn production was projected at 14.7 billion bushels at a U.S. farm price of $5.34/bu. The USDA now estimates U.S. corn production will reach only 10.8 billion bu. this year, at $8.20/bu.

Link

In other words, at the beginning of the season, USDA predicted $78.5 billion in corn production, and now USDA is predicting $88.6 billion. That's not bad if losses are insured, and shouldn't hurt many of the agricultural manufacturers and retailers.

The livestock farmers and the gasoline consumers are the ones getting hurt, and they want EPA ethanol waivers issued, which will drive down corn prices. I'm pretty skeptical about the importance placed on Iowa in the politics of ethanol, but the issue of whether to grant waivers before November seems likely to involve the highest of political considerations.
Steve Verdon Link

Nate Silverâ€™s background is sabermetrics….

Then he should know better or is being very sloppy trying to make this more accessible those without the mathematical background to understand.
PD Shaw Link

Sorry Dave, could you kill the second re-posting?

From Dave: Done. I also fixed the broken link on the first copy.
steve Link

Way too early to figure this one out. The GOP should be way ahead. They are not because they have given us weak candidate. It will be a coin toss unless there is some major event pushing the electorate.

Steve
Andy Link

As a guy who does prediction for a living, I’m very skeptical these statistical models actually show skill.
Steve Verdon Link

I think Linzer’s looks interesting as it takes into account information as it comes available and updates previous results in light of that new information. I’ll be curious to see how it performs in this election, I hope he updates his analysis.

In general, one of the biggest hurdles to Bayesian analysis has been computational ones…i.e. computers that could handle the number crunching, but that is no longer really a binding constraint and hopefully it will become more common place.

It’s a Family Affair

Making a Problem Unsolveable By Expanding It