A friend pointed me to a discussion at the economist.com about determining the political "momentum" of the presidential candidates. There, a poster suggested that a mathematical model be constructed in order to examine the factors associated with a politicians' likelihood to win the nomination. Fitting a smoothed trendline to polling data might allow us to consider the effects of certain events on a candidates' chances on the basis of temporal concurrence, but only roughly.

How can we isolate the effects of so many potential pieces of information? It's obvious that the likelihood of coming away the nominee is dependent on a candidate's primary wins, superdelegate count, popular vote total, as well as potentially scandalous news stories. Without controlling for each of these, it would be difficult to be confident that we're considering the true qualities associated with victory.

Fortunately, regression analysis allows us to do just that. If we can relate, say Barack Obama's chance to win the nomination with important causative factors like those mentioned above, we might be able to tell just how much the average primary win improves his probability of facing McCain in the fall, or how much the Reverend Wright controversy has affected him. But first, we have to collect the relevant data and build the model.

The Intrade prediction market allows traders to enter into futures contracts that consider the probability that Barack Obama will win the Democrat party's nomination. The price of the contract equates exactly with the market's belief that he will be the nominee, assuming that the market is efficient. By including the time series of price as the dependent variable in the regression analysis, we can estimate how the superdelegates or Reverend Wright have affected his chances.

Let:

y(t) = B*x(t) + u(t)

where y(t) is the price of Barack's contract (the market specified probability that he will win) at time t; x(t) is a matrix of exogenous (independent) variables that likely effect y(t), such as the number of superdelegates committed to either candidate, their popular vote totals, primary wins and news events; B is a vector of coefficients that relate x(t) to y(t); u(t) is a random error term that accounts for the "noise" in the futures price. Those terms in x(t) that correspond to statistically significant terms in B indicate that we can say with confidence that they had a meaningful impact on Barack's probability of winning. t=1 for 1/6/2008; the day after the Iowa caucus.

So. What are the results? Well...

Significant results are indicated by *'s. The more asterisks, the more confident we are that the independent variable affects the contract price. Generally, we consider significant only those variables that are significant at the 5% level (indicated by one *); we accept a 5% chance that the variables is in fact not significant, but that the coefficient was generated randomly. Two **'s indicate a high level of confidence (the 1% significance level).

The regression results seem to make sense; at least they exhibit the proper signs. As expected, Obama's wins (significant at the 10.8% level), his popular votes, and the interaction between them (interpreted as the extra effect of winning large states) positively affect his chances to win. On other hand, Clinton's wins, her popular votes, and their interaction adversely affected his Intrade price.

Interestingly, we may interpret the share price one-to-one with percentage points in the probability that Obama will be the nominee. So we can quantify the results: the coefficient of 3.48 on Obama's wins relates that each primary win boosts his probability of winning the nomination by about 3.5%. Of course, the average Clinton win reduces the probability that he will face McCain by almost 5%.

I made a few other assumptions about the nature of the news events and debates. Each Reverend Wright incident (when the comments broke, and when he reiterated them in front of the Press Club) was assumed to last for 5 days, about the length of a news cycle. The impact of Barack's response and the debates was assumed to last for 3 days. These values can (and probably should) be changed to maximize the likelihood of observing the price data, but I included them as structural assumptions for simplicity. No other news events (Hillary'"misremembering" the warzone evasion) were considered.

If we accept those assumptions, we can further quantify the impact of the Reverend Wright controversy on Obama's chances. According to the table, both instances were detrimental to Obama's probability of winning, by 7% and 5%, respectively. Interestingly, the results indicate that Obama's Philadelphia speech, in which he refused to "disown" the Reverend served in fact to reduce his share price. Although widely lauded by the media, my results show that the speech may have lowered the likelihood that he will take the nomination.

So, the results seem interesting. But how realistic are they? Well, in order to gauge that, I found that model explained almost 94% of the variation in the share price, an acceptable proportion for time series work. Further, I was pleased to see that the estimated coefficients did a reasonable job predicting the contract price, as related by the chart below. In other words, the red line (prediction) resembles the blue line (price) effectively:

So, what can we take away? If the objective is to quantify the effect of various exogenous (independent) variables on the probability that a candidate will be elected, we should consider using econometrics rather than simply attempting to calculate derivatives of the trend line in opinion polling. For one, we can check how well our models measure up to the empirical data. Additionally, we can estimate the relevant parameters while controlling for the effects of multiple, simultaneous influences.

NOTE:

1. It is important to verify that the relationship between the time series variables is stationary. That is, two series may not be related to one another (say U.S. GDP and the total number of homeruns hit in Japanese baseball). However, since both are increasing over time, we may find a significant relationship between them if we simply regress one on the other. If we have stationarity in the relationship (i.e., if the residual does not have a unit root), then we don't have to worry about this problem. I verified that the relationship between the price of Barack's contract and the explanatory variables used in estimation is indeed stationary, by computing the post-estimation residuals and performing an augmented Dickey-Fuller test.

2. Intrade prices were used to represent the probability that Obama will win rather than opinion polls, since futures prices are real choices in a market (revealed preference), versus simply stated preferences.

Copyright © 2008 TCE.

## Friday, May 09, 2008

Subscribe to:
Post Comments (Atom)

## No comments:

Post a Comment